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(54) Title: THE C. ELEGANS GRO-1 GENE 

(57) Abstract 

The invention relates to the identifica- 
tion of gro-1 gene and to demonstrate that 
the gro-1 gene is involved in the control of 
a central physiological clock. Also disclosed 
are four other genes located within the same 
operon as the gro-1 gene. 
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THE C. ELEGANS GRO-1 GENE 

BACKGROUND OF THE INVENTION 

(a) Field of the Invention 

The invention relates to the identification of 
5 gro-1 gene and four other genes located within the same 
operon and to show that the gro-1 gene is involved in 
the control of a central physiological clock, 

(b) Description of Prior Art 

The gro-1 gene was originally defined by a 

10 spontaneous mutation isolated from of a Caenorhabditis 
elegans strain that had recently been established from 
a wild isolate (J. Hodgkin and T. Doniach, Genetics 
146; 149-164 (1997)). We have shown that the activity 
of the gro-1 gene controls how fast the worms live and 

15 how soon they die. The time taken to progress through 
embryonic and post-embryonic development, as well as 
the life span of gro-1 mutants is increased (Lakowski 
and Hekimi, Science 272:1010-1013, (1996)). Further- 
more, these defects are maternally rescuable: when 

2 0 homozygous mutants {gro-l/gro~l) derive from a 
heterozygous mother {gro-1 /+) , these animals appear to 
be phenotypically wild-type. The defects are seen only 
when homozygous mutants derive from a homozygous mother 
(Lakowski and Hekimi, Science 272:1010-1013, (1996)). 

25 In general, the properties of the gro-1 gene are simi- 
lar to those of three other genes, clk-1, clk-2 and 
clk-3 (Wong et al., Genetics 139: 1247-1259 (1995); 
Hekimi et al., Genetics, 141: 1351-1367 (1995); 
Lakowski and Hekimi, Science 272:1010-1013, (1996)), 

30 and this combination of phenotypes has been called the 
Clk ("clock") phenotype. All four of these genes 
interact to determine developmental rate and longevity 
in the nematode. Detailed examination of the clk-1 
mutant phenotype has led to the suggestion that there 

35 exists a central physiological clock which coordinates 
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all or many aspects of cellular physiology, from cell 
division and growth to aging. All four genes have a 
similar phenotype and thus appear to impinge on this 
physiological clock. 
5 It would be highly desirable to be provided with 

the molecular identity of the gro-1 gene. 

SUMMARY OF THE INVENTION 

One aim of the present invention is to provide 
10 the molecular identity of the gro-1 gene and four other 
genes located within the same operon. 

In accordance with the present invention there 
is provided a gro-1 gene which has a function at the 
level of cellular physiology involved in developmental 
15 rate and longevity, wherein gro-1 is located within an 
operon and gro-1 mutants have a longer life and a 
altered cellular metabolism relative to the wild-type. 

In accordance with a preferred embodiment, the 
gro-1 gene of the present invention codes for a GRO-1 
20 protein having the amino acid sequence set forth in 
Figs. 3A-3B (SEQ ID. NO:2). 

The gro-1 gene is located within an operon which 
has the nucleotide sequence set forth in SEQ ID NO:l 
and which also codes for four other genes, referred as 
25 gop-l f gop-2, gop-3 and hap-1 genes. 

In accordance with a preferred embodiment, the 
gop-1 gene of the present invention codes for a GOP-1 
protein having the amino acid sequence set forth in 
Figs. 13A-13C (SEQ ID. NO:4). 
30 In accordance with a preferred embodiment, the 

gop-2 gene of the present invention codes for a GOP-2 
protein having the amino acid sequence set forth in 
Fig. 14 (SEQ ID. NO: 5) . 

In accordance with a preferred embodiment, the 
35 gop-3 gene of the present invention codes for a GOP-3 
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protein having the amino acid sequence set forth in 
Figs. 15A-15B (SEQ ID. NO: 6). 

In accordance with a preferred embodiment, the 
hap-1 gene of the present invention codes for a HAP-1 
5 protein having the amino acid sequence set forth in 
Fig. 16 (SEQ ID. N0:7) . 

In accordance with a preferred embodiment of the 
present invention, the gro-1 gene is of human origin 
and has the nucleotide sequence set forth in Fig. 8 
10 (SEQ ID. NO:3) . 

In accordance with a preferred embodiment of the 
present invention, there is provided a mutant GRO-1 
protein which has the amino acid sequence set forth in 
Fig. 3C. 

15 In accordance with the present invention there 

is also provided a GRO-1 protein which has a function 
at the level of cellular physiology involved in devel- 
opmental rate and longevity, wherein said GRO-1 protein 
is encoded by the gro-1 gene identified above. 

20 In accordance with a preferred embodiment of the 

present invention, there is provided a GRO-1 protein 
which has the amino acid sequence set forth in Figs. 
3A-3B (SEQ ID. NO : 2 ) . 

In accordance with a preferred embodiment of the 

25 present invention, there is provided a GOP-1 protein 
which has the amino acid sequence set forth in Figs. 
13A-13C (SEQ ID. NO:4). 

In accordance with a preferred embodiment of the 
present invention, there is provided a GOP-2 protein 

30 which has the amino acid sequence set forth in Fig. 14 
(SEQ ID. NO:5) . 

In accordance with a preferred embodiment of the 
present invention, there is provided a GOP-3 protein 
which has the amino acid sequence set forth in Figs. 

35 15A-15B (SEQ ID. NO; 6). 
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In accordance with a preferred embodiment of the 
present invention, there is provided a HAP-1 protein 
which has the amino acid sequence set forth in Fig. 16 
(SEQ ID. NO: 7) . 

5 In accordance with the present invention there 

is also provided a method for the diagnosis and/or 
prognosis of cancer in a patient, which comprises the 
steps of: 

a) obtaining a tissue sample from said patient; 

10 b) analyzing DNA of the obtained tissue sample of 

step a) to determine if the human gro-1 gene is 
altered, wherein alteration of the human gro-1 gene is 
indicative of cancer. 

In accordance with the present invention there 

15 is also provided a mouse model of aging and cancer, 
which comprises a gene knock-out of murine gene homolo- 
gous to gro-1 . 

In accordance with the present invention there 
is provided the use of compounds interfering with enzy- 

20 matic activity of GRO-1, GOP-1, GOP-2, GOP-3 or HAP-1 
for enhancing longevity of a host. 

In accordance with the present invention there 
is provided the use of compounds interfering with enzy- 
matic activity of GRO-1, GOP-1, GOP-2, GOP-3 or HAP-1 

25 for inhibiting tumorous growth. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1A illustrates the genetic mapping of 

gro-1; 

30 Fig. IB illustrates the physical map of the 

gro-1 region; 

Fig. 2A illustrates cosmid clones able to rescue 
the gro-1 (e2400) mutant phenotype; 

Fig. 2B illustrates the genes predicted by 
35 Genefinder, the relevant restriction sites and the 
fragments used to subclone the region; 
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Figs. 3A-3B illustrate the genomic sequence and 
translation of the C. elegans gro-1 gene (SEQ. ID. 
NO : 2 ) ; 

Fig. 3C illustrates the predicted mutant pro- 

5 tein; 

Fig. 4A illustrates the five genes of the gro-1 
operon ( SEQ . ID . NO : 1 ) ; 

Fig. 4B illustrates the transplicing pattern of 
the five genes of the gro-1 operon; 
10 Eig. 5 illustrates the alignment of gro-1 with 

the published sequences of the E. coll (P16384) and 
yeast (P07884) enzymes; 

Fig. 6 illustrates the biosynthetic step cata- 
lyzed by DMAPP transferase (MiaAp in E. coll f Mod5p in 
15 S. cerevislae, and GRO-1 in C. elegans) ; 

Fig. 7 illustrates the alignment of the pre- 
dicted HAP-1 amino acid sequence with homologues from 
other species; 

Fig. 8 illustrates the full mRNA sequence of 
20 human homologue of gro-1 referred to as hgro-1 (SEQ. 
ID. NO: 3) ; 

Fig. 9 illustrates a comparison of the 
conceptual amino acid sequences for GRO-1 and hgro-lp; 

Fig. 10 illustrates a conceptual translation of 
25 a partial sequence of the Drosophila homologue of gro-1 
(AA816785) ; 

Fig. 11 illustrates the structure of pMQ8; 

Fig. 12 illustrates construction of pMQ18; 

Figs. 13A-13C illustrate the genomic sequence 
30 and translation of the gop-1 gene (SEQ. ID. NO:4); 

Fig. 14 illustrates the genomic sequence and 
translation of the gop-2 gene (SEQ. ID. NO: 5); 

Figs. 15A-15B illustrate the genomic sequence 
and translation of the gop-3 gene (SEQ. ID. NO:6); and 
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Fig. 16 illustrates the genomic sequence and 
translation of the hap-1 gene (SEQ. ID. NO : 7 ) . 

DETAILED DESCRIPTION OF THE INVENTION 

5 

The gro~l phenotype 

In addition to the previously documented pheno- 
types, we recently found that gro-1 mutants were tem- 
perature-sensitive for fertility. At 25°C the progeny 

10 of these mutants is reduced so much that a viable 
strain cannot be propagated. In contrast, gro-1 

strains can easily be propagated at 15 and 20°C. 

We also discovered that the gro-1 (e2400) muta- 
tion increases the incidence of spontaneous mutations. 

15 As gro-1 (e2400) was originally identified in a non- 
standard background (Hodgkin and Doniach, Genetics 146: 
149-164 (1997)), we first backcrossed the mutations 8 
times against N2, the standard wild type strain. We 
then undertook to examine the gro-1 strain and N2 for 

20 the occurrence of spontaneous mutants which could be 
identified visually. We focused on the two class of 
mutants which are detected the most easily by simple 
visual inspection, uncoordinated mutants (Unc) and 
dumpy mutants (Dpy) . We examined 82 00 wild type worms 

25 and found no spontaneous visible mutant. By contrast, 
we found 6 spontaneous mutants among 12500 gro-1 
mutants examined. All mutants produced entirely mutant 
progeny indicating that they were homozygous. 
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Sequences of all primers used 
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T^PAAAGGCTCTGGAACTCC 


SFO ID MO 1 ?? 


SHP129 


1 W V V5| 


AAAAACCACTTGATATAAGG 


SEQ ID NO'23 


SHP130 


rpvprsp 

IC VCI oc 


CATCCAAAAGCAGTATCACC 


SEQ ID NO'24 


SHP134 


forward 

1 V/ 1 WWII VI 


TTAATTGGATGCAAGCACCCC 


SEQ. ID. NO:25 


SHP135 


reverse 


ATTACTATACGAACATTTCC 


SEQ. ID. NO:26 


SHP138 


forward 


1 I'GIAAAGGCGI IAGI 1 1 GG 


SEQ. ID. NO:27 


SHP139 


forward 


CAGGAG'I A'l I I GG 1 GATGCG 


SEQ. ID. NO:28 


SHP140 


forward 


CGACGGGGAGAAGGTGACGG 


SEQ. ID. NO:29 


SHP141 


reverse 


AAAACTTCTACCAACAATGG 


SEQ. ID. NO:30 


SHP142 


reverse 


CGTAATCTCTCTCGATTAGC 


SEQ. ID. NO:31 


SHP143 


reverse 


CCGTGGGATGGCTACTTGCC 


SEQ. ID. NO:32 


SHP144 


reverse 


1 GGA 1 1 1 G 1 GGCACGAGCGG 


SEQ. ID. NO:33 


SHP145 


reverse 


TTGATTGCCTCTCCTCGTCC 


SEQ. ID. NO:34 


SHP146 


reverse 


ATCAACATCTGATTGATTCC 


SEQ. ID. NO:35 


SHP151 


forward 


CAGCGAGCGCATGCAACTATATATTG 
AGCAGG 


SEQ. ID. NO:36 


SHP159 


forward 


AAI AAA'I A I 1 1 AAA 1 A 1 IGAGAIAIACC 
CTGAACTCTACAG 


SEQ. ID. NO:37 


SHP160 


reverse 


AAACTGTAG AGTTC AG G GTATATCTG 
AATAI 1 1 AAA 1 A 1 1 IATTC 


SEQ. ID. NO:38 
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SHP161 


forward 


GTACGTGGAGCTCTGCAACTATATATT 
GAGCAGG 


SEQ. ID. NO:39 


SHP162 


reverse 


ATGACACTGCAGGATAGTTCCCTTCG 
TTCGGG 


SEQ. ID. NO:40 


SHP163 


forward 


GTGTTGCATCAGTTCATTCC 


SEQ. ID. NO:41 


SHP164 


forward 


G CTGTG CTAG A AGTC AG AG G 


SEQ. ID. NO:42 


SHP165 


reverse 


GTTCTCCTTGGAATTCATCC 


SEQ. ID. NO:43 


SHP170 


reverse 


AGTATATCTAGATGTGCGAGTCTCTG 
uCAA 1 I 


SEQ. ID. NO:44 


SHP171 


reverse 


AGTAATTGTACATTTAGTGG 


SEQ. ID. NO:45 


SHP172 


forward 


ATTAACCTTACTTACTTACC 


SEQ. ID. NO:46 


SHP173 


forward 


CTAAACTAAGTAATATAACC 


SEQ. ID. NO:47 


SHP174 


reverse 


GTTGATTCTTTGAGCACTGG 


SEQ. ID. NO:48 


SHP175 


forward 


AATTCGACCAATTACATTGG 


SEQ. ID. NO:49 


SHP176 


reverse 


AACATAGTTGTTGAGGAAGG 


SEQ. ID. NO:50 


SHP177 


forward 


AATTAATGGAGATTCTACGG 


SEQ. ID. NO:51 


SHP178 


forward 


TC AG C ATCTAG AAATG C AGG 


SEQ. ID. NO:52 


SHP179 


reverse 


CGAATGTCAACATTCACTGG 


SEQ. ID. NO:53 


SHP180 


forward 


CTTAACCTGATGTGTACTCG 


SEQ. ID. NO:54 


SHP181 


forward 


ATGAAGCTTTAGAGGATGCC 


SEQ. ID. NO:55 


SHP182 


forward 


C G AC G AATTTCTG G AGTCG G 


SEQ. ID. NO:56 


SHP183 


reverse 


ACTGCATTATCCATTAATCC 


SEQ. ID. NO:57 


SHP184 


reverse 


CACCCAAATAACATCTATCC 


SEQ. ID. NO:58 


SHP185 


forward 


TTTAACCTCATCTTCGCTGG 


SEQ. ID. NO:59 


SHP190 


forward 


ATGTTCC G C AAG CTTG GTTC 


SEQ. ID. NO:60 


SL1 


forward 


i 1 IAAI IACCGAAGI 1 1 GAG 


SEQ. ID. NO:61 


SL2 


forward 


1 1 1 IAACCCAGI IACICAAG 


SEQ. ID. NO:62 



Positional cloning of gro-1 

gro-1 lies on linkage group III, very close to 
the gene clk-1 . To genetically order gro-1 with 
5 respect to clk-1 on the genetic map, 54 recombinants in 
the dpy-17 to lon-1 interval were selected from among 
the self progeny of a strain which was unc-7 9 (el030) + 
+ clk-1 (e2519) lon-1 (e678) +/+ dpy-17 (el64) gro- 
l(e2400) + sma-4 (e729) . Three of these showed neither 
10 the Gro-1 nor the Clk-1 phenotypes, but carried unc-79 
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and sma-4, indicating that these recombination events 
had occurred between gro-1 and clk-1 . From the dispo- 
sition of the markers, this showed that the gene order 
was dpy-17 gro-1 clk-1 lon-1, and the frequency of 
5 events indicated that the gro-1 to clk-1 distance was 
0.03 map units. In this region of the genome, this 
corresponds to a physical map distance of -2 0 kb. 

Several cosmids containing wild-type DNA span- 
ning this region of the genome were tested by microin- 

10 jection into gro-1 mutants for their ability to comple- 
ment the gro-1 (e2400) mutation (Fig. 1) . gro-1 was 
mapped between dpy-17 and lon-1 on the third chromo- 
some, 0.03 m.u. to the left of clk-1 (Fig. 1A) . 

Based on the above genetic mapping, gro-1 was 

15 estimated to be approximately 20 kb to the left of clk- 
2. Eight cosmids (represented by medium bold lines) 
were selected as candidates for transformation rescue 
(Fig. IB). Those which were capable of rescuing the 
gro-1 (e2400) mutant phenotype are represented as heavy 

20 bold lines (Fig. IB) . 

Of these, only B0498, C34E10 and ZC395 were able 
to rescue the mutant phenotype. Transgenic animals 
were fully rescued for developmental speed. In 
addition, the transgenic DNA was able to recapitulate 

25 the maternal rescue seen with the wild-type gene, that 
is, mutants not carrying the transgenic DNA but derived 
from transgenic mothers display a wild type phenotype. 
The 7 kb region common to the three rescuing cosmids 
had been completely sequenced, and this sequence was 

30 publicly available. 

We generated subclones of ZC395 and assayed them 
for rescue (Fig. 2). The common 6.5 kb region is blown 
up in part B. B0498 has not been sequenced and 
therefore its ends can not be positioned and are there- 

35 fore represented by arrows. 
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One subclone pMQ2, spanned 3.9 kb and was also 
able to completely rescue the growth rate defect and 
recapitulate the maternal effect. The sequences in 
pMQ2 potentially encodes two genes. However, a second 
5 subclone, pMQ3, which contained only the first of the 
potential genes (named ZC395.7 in Fig. 2A) , was unable 
to rescue. 

Furthermore, frameshifts which would disrupt 
each of the two genes 1 coding sequences were con- 

10 structed in pMQ2 and tested for rescue. Disruption of 
the first gene (in pMQ4 ) did not eliminate rescuing 
ability, but disruption of the second gene (in pMQ5) 
did. This indicates that the gro-1 rescuing activity is 
provided by the second predicted gene. 

15 pMQ2 was generated by deleting a 29.9 kb Spel 

fragment from ZC395, leaving the left-most 3.9 kb 
region containing the predicted genes ZC395.7 and 
ZC395.6 (Fig. 2B) . pMQ3 was created in the same fash- 
ion, by deleting a 31.4 kb Ndel fragment from ZC395, 

20 leaving only ZC395.7 intact. In pMQ4, a frameshift was 
induced in ZC395.7 by degrading the 4 bp overhang of 
the Apal site. A frameshift was also induced in pMQ5 
by filling in the 2 bp overhang of the Ndel site found 
in the second exon of ZC395.6. These frameshifts pre- 

25 sumably abolish any function of ZC395.7 and ZC395.6 
respectively. The dotted lines represent the extent of 
frameshift that resulted from these alterations. 

To establish the splicing pattern of this gene, 
cDNAs encompassing the 5 T and 3 T halves of the gene 

30 were produced by reverse transcription- PCR and 
sequenced (Fig. 3) . 

This revealed that the gene is composed of 9 
exons, spans -2 kb, and produces an mRNA of 1.3 kb . To 
confirm that this is indeed the gro-1 gene, genomic DNA 

35 was amplified by PCR from a strain containing the gro- 
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1 (e2400) mutation and the amplified product was 
sequenced. A lesion was found in the 5th exon, where a 
9 base-pair sequence has been replaced by a 2 base-pair 
insertion, leading to a frameshift (Fig, 3C) . Fig. 3C 
5 illustrates those residues which differ from wild type 
are in bold. 

The reading frame continues out-of-frame for 
another 33 residues before terminating. 

Figs. 3A-B illustrate the coding sequence in 

10 capital letters, while the introns, and the untrans- 
lated and intergenic sequence are in lower case let- 
ters. The protein sequence is shown underneath the 
coding sequence. Position 1 of the nucleotide sequence 
is the first base after the SL2 trans-splice acceptor 

15 sequence. Position 1 of the protein sequence is the 
initiator methionine. All PCR primers used for genomic 
and cDNA amplification are represented by arrows. For 
primers extending downstream (arrows pointing right) 
the primer sequence corresponds exactly to the nucleo- 

2 0 tides over which the arrow extends. But for primers 
extending upstream (arrows pointing left) the primer 
sequence is actually the complement of the sequence 
under the arrow. In both cases the arrow head is at 
the 3 T end of the primer. The sequence of the two 

25 primers which flank gro-1 (SHP93 and SHP92) are not 
represented in this figure. Their sequences are: SHP93 
TTTCTGGATTTTAACCTTCC (SEQ. ID. NO: 10) and SHP92 
GATAGTTCCCTTCGTTCGGG (SEQ. ID. NO: 9). The wild type 
splicing pattern was determined by sequencing of the 

30 cDNA. Identification of the e2400 lesion was 

accomplished by sequencing the e2400 allele. The e2400 
lesion consists of a 9 bp deletion and a 2 bp insertion 
at position 1196, resulting in a frameshift. 
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gro-1 is part of a complex operon (Figs. 3A-3B) 

Amplification of the 5 T end of gro-1 from cDNA 
occurred only when the trans-spliced leader SL2 was 
used as the 5 ? primer, and not when SL1 was used. SL2 
5 is used for trans-splicing to the downstream gene when 
two genes are organized into an operon (Spieth et al. r 
Cell 73: 521-532 (1993); Zorio et al • , Nature 372: 270- 
272 (1994)). This indicates that at least one gene 
upstream of gro-1 is co-transcribed with gro-1 from a 

10 common promoter. We found that sequences from the 5 T 
end of the three next predicted genes upstream of gro-1 
(ZC395.7, C34E10.1, and C34E10.2) all could only be 
amplified with SL2 . Sequences from the fourth 

predicted upstream gene (C34E10.3), however, could be 

15 amplified with neither spliced leader, suggesting that 
it is not trans-spliced. The distance between genes in 
operons appear to have an upper limit (Spieth et a J . , 
Cell 73: 521-532 (1993); Zorio et al., Nature 372: 270- 
272 (1994)), and no gene is predicted to be close 

2 0 enough upstream of C34E10.3 or downstream of gro-1 to 
be co-transcribed with these genes. Our findings sug- 
gest therefore that gro-1 is the last gene in an operon 
of five co-transcribed genes (Fig. 4) . 

Nested PCR was used to amplify the 5' end of 

25 each gene. SL1 or SL2 specific primers were used in 
conjunction with a pair of gene-specific primers. cDNA 
generated by RT-PCR using mixed stage N2 RNA was used 
as template in the nested PCR. Fig. 4A illustrates a 
schematic of the gro-1 operon showing the coding 

30 sequences of each gene and the primers (represented by 
flags) used to establish the trans-splicing patterns. 

Fig. 4B illustrates the products of the PCR with 
SL1 and SL2 specific primers for each of the five 
genes. The sequences of the primers used are as fol- 

35 lows: SL1: T T T AAT T AC C C AAG T T T GAG (SEQ. ID. NO: 61), SL2 : 
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T T T TAAC C C AG T T AC T C AAG 


(SEQ. 


ID. 


NO: 


62) , 


SHP141 


AAAAC T T C T AC CAACAAT GG 


(SEQ. 


ID. 


NO : 


30) , 


SHP142 


CGTAATCTCTCTCGATTAGC 


(SEQ. 


ID. 


NO : 


31) , 


SHP143 


CCGTGGGATGGCTACTTGCC 


(SEQ. 


ID. 


NO : 


32) , 


SHP144 


TGGATTTGTGGCACGAGCGG 


(SEQ. 


ID. 


NO: 


33) , 


SHP145 


TTGATTGCCTCTCCTCGTCC 


(SEQ. 


ID. 


NO: 


34) , 


SHP146 


ATCAACATCTGATTGATTCC 


(SEQ. 


ID. 


NO: 


35) , 


SHP130 


CAT C CAAAAG C AG TAT C AC C 


(SEQ. 


ID. 


NO: 


24) , 


SHP119 


ACATCTTTATCCATTTCTCC 


(SEQ. 


ID. 


NO 


:21) , 


SHP95 


TACAGGAAT T T T T GAACGGG 


(SEQ. 


ID. 


NO 


:12) , 


SHP99 


ATCGATACCACCGTCTCTGG 


(SEQ. ID. 


NO : 1 6 ) 









The gene immediately upstream of gro-1 r has 
homology to the yeast gene HAM1, and we have renamed 
the gene hap-1. We have established its splicing pat- 

15 tern by reverse transcription PCR and sequencing. This 
revealed that hap-1 is composed of 5 exons and produces 
an mRNA of 0.9 kb. We also found that sequences which 
were predicted to belong to ZC395.7 (now hap-1) are in 
fact spliced to the exons of C34E10.1. This is consis- 

20 tent with our finding that hap-1 is SL2 spliced as it 
puts the end of the C34E10.1 very close to the start of 
hap-1 (Fig. 4) . 
The gro-1 gene product 

Conceptual translation of the gro-1 transcript 

25 indicated that it encodes a protein of 430 amino acids 
highly similar to a strongly conserved cellular enzyme: 
dimethylallyldiphosphate : tRNA dimethylallyltransf erase 
(DMAPP transferase) . Fig, 5 shows an alignment of gro- 
1 with the published sequences of the E. coll (P16384) 

30 and yeast (P07884) enzymes. Residues where the 

biochemical character of the amino acids is conserved 
are shown in bold. Identical amino acids are indicated 
further with a dot. The ATP/GTP binding site and the 
C2H2 zinc finger site are predicted and not 

35 experimental. The point at which the gro-1 (e2400) 



WO 99/10482 



- 14 - 



PCT/CA98/00803 



mutation alters the reading frame of the sequence is 
shown. The two alternative initiator methionines in 
the yeast sequence, and the putative corresponding 
methionines in the worm sequence, are underlined. 
5 Database searches also identified a homologous 

human expressed sequence tag (Genbank ID: Z40724). The 
human clone has been used to derive a sequence tagged 
site (STS) . This means that the genetic and physical 
position of the human gro-1 homologue is known. It 

10 maps to chromosome 1, 122.8 cR from the top of Chr 1 
linkage group and between the markers D1S255 and 
D1S2861. This information was found in the UniGene 
database or the National Center for Biotechnology 
Information (NCBI) . We have sequenced Z4 0724 by 

15 classical methods but found that Z40724 is not a full 
length cDNA clone as it does not contain an initiator 
methionine nor the poly A tail. We used the sequence of 
Z40724 to identify further clones by database searches. 
We found one clone (Genbank ID: AA3 32152) which 

20 extended the sequence 5' by 28 nucleotides, as well as 
one clone (Genbank ID: AA121465) which extended the 
sequence substantially in the 3' direction but didn't 
include the poly A tail. We then used AA1214 65 to 
identify an additional clone (AA847885) extending the 

25 sequence to the poly A tail. Fig. 8 shows the full 
sequence with the putative initiator ATG shown in bold 
and the sequence of Z60724 is shown underlined. A 
comparison of the conceptual amino acid sequences for 
GRO-1 and hgro-lp is shown in Fig. 9. Amino acid 

30 identities are indicated by a dot. Both sequences 
contain a region with a zinc finger motif which is 
shown underlined. 

An additional metazoan homologue is represented 
by Drosophila EST: Genbank accession: AA816785, In E. 

35 coli and other bacteria, the gene encoding DMAPP trans- 
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f erase is called miaA (a.k.a trpX) and is called mod5 
in yeast. DMAPP transferase catalyzes the modification 
of adenosine 37 of tRNAs whose anticodon begins with U 
(Fig. 6) . 

5 In these organisms the enzyme has been shown to 

use dimethylallyldiphosphate as a donor to generate 
dimethylallyl-adenosine (dma^A37), one base 3 T to the 
anticodon (for review and biochemical characterization 
of the bacterial enzyme see Persson et al . r Biochimie 
10 76: 1152-1160 (1994); Leung et al . , J Biol Chem 272: 
13073-13083 (1997) ; Moore and Poulter, Biochemistry 
36:604-614 (1997)). In earlier literature this modifi- 
cation is often referred to as isopentenyl adenosine 
(i 6 A37) . 

15 The high degree of conservation of the protein 

sequence between GRO-1 and DMAPP in S. cerevisiae and 
E. coli suggest that GRO-1 possesses the same enzymatic 
activity as the previously characterized genes. The 
sequence contains a number of conserved structural 

20 motifs (Fig. 5), including a region with an ATP/GTP 
binding motif which is generally referred to as the T A' 
consensus sequence (Walker et al., EMBO J 1: 945-951 
(1982)) or the T P-loop T (Saraste et al . , Trends Biochem 
Sci 15: 430-434 (1990)). 

25 In addition, at the C-terminal end of the GRO-1 

sequence, there is a C2H2 zinc finger motif as defined 
by the PROSITE database. This type of DNA-binding 
motif is believed to bind nucleic acids (Klug and 
Rhodes, Trends Biochem Sci 12: 464-469 (1987)). 

3 0 Although there appears to be some conservation between 
the worm and yeast sequences in the C-terminus end of 
the protein (Fig. 5) , including in the region encom- 
passing the zinc finger in GRO-1, the zinc finger motif 
per se is not conserved in yeast but is present in 

35 humans (Fig. 9) . 
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In yeast DMAPP transferase is the product of the 
MODS gene, and exists in two forms: one form which is 
targeted principally to the mitochondria, and one form 
which is found in the cytoplasm and nucleus. These two 
5 forms differ only by a short N-terminal sequence whose 
presence or absence is determined by differential 
translation initiation at two "in frame" ATG codons . 
(Gillman et al . , Mol & Cell Biol 11: 2382-90 (1991)). 
The gro-1 open reading frame also contains two ATG 

10 codons at comparable positions, with the coding 
sequence between the two codons constituting a plausi- 
ble mitochondrial sorting signal (Figs. 3 and 5). It is 
likely therefore that DMAPP transferase in worms also 
exists in two forms, mitochondrial and cytoplasmic. 

15 It should be noted, however, that the sequence 

of hgro-1 shows only one in-frame methionine before the 
conserved ATP/GTP binding site (Fig. 9) . As we cannot 
be assured to have determined the sequence of the full 
length transcript, it is possible that further 5' 

20 sequence might reveal an additional methionine. 
Alternatively, in humans, the mechanism by which the 
enzyme is targeted to several compartments might not 
involved differential translation initiation. In this 
context, it should be noted that the sorting signals 

25 which can be predicted from the sequence of hgro-lp are 
predicted to be highly ambiguous by the prediction 
program PSORT II. Furthermore, a conceptual translation 
of the Drosophila sequence (AA816785) predicts only one 
initiator methionine before the ATP/GTP binding site as 

30 well as several in-frame stop codons upstream of this 
start (Fig. 10), suggesting that no additional upstream 
ATG could serve as translation initiation site. In the 
figure, stop codons are indicated by stop , methionines 
are indicated by Met, and the conserved ATP/GTP binding 

35 site is underlined. 



WO 99/10482 



- 17 - 



PCT/CA98/00803 



Expression pattern of GRO-1 

We have also constructed a reporter gene 
expressing a fusion protein containing the entire GRO-1 
amino acid sequence fused at the C-terminal end to 
5 green fluorescent protein (GFP) . The promotor of the 
reporter gene is the sequence upstream of gop-1 
(Figs. 13A-13C) , the first gene in the operon (see 
Fig. 4) . The promotor sequence is 30 6 bp long starting 
32 nucleotides upstream of the gop-1 ATG . It is fused 

10 at the exact level upstream of gro-1 where trans- 
splicing to SL2 normally occurs. 

The genes gop-2 (Fig. 14) and gop-3 (Figs. 15A- 
15B) are also located in the operon (see Fig. 4), the 
second and third genes in the operon. 

15 We first construct the clone pMQ8 in which gro-1 

is directly under the promoter for the whole operon 
using the hybrid primers SHP160 (SEQ. ID. NO: 38) and 
SHP159 (SEQ. ID. NO: 37) and the flanking primers SHP161 
(SEQ. ID. NO:39) and SHP162 (SEQ. ID. NO:40) in 

20 sequential reactions each followed by purification of 
the products and finally cloning into pUC18 (Fig. 11). 

Primers SHP151 (SEQ. ID. NO: 36) and SHP170 (SEQ. 
ID. NO: 44) where then used to amplify part of the 
insert in pMQ8 and clone in pPD95.77 (gift from Dr 

25 Andrew Fire) which was designed to allow a protein of 
interest to be transcriptionally fused to Green 
Fluorescent Protein (GFP) (Fig. 12) . 

The reporter construct fully rescues the 
phenotype of a gro-1 (e2400) mutant upon injection and 

30 extrachromosomal array formation, indicating that the 
fusion to the GFP moiety does not significantly inhibit 
the function of GRO-1. Fluorescent microscopy indicated 
that gro-1 is expressed in most or all somatic cells. 
Furthermore, the GRO-1:: GFP fusion protein is localized 
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in the mitochondria, in the cytoplasm as well as in the 
nucleus. 

The hap-1 gene product (Fig. 16) 

hap-1 is homologous to the yeast gene HAM1 as 
5 well as to sequences in many organisms including bacte- 
ria and mammals (Fig. 7) „ 

The origin of the worm and yeast sequence is as 
described above and below. The human sequence was 
inferred from a cDNA sequence assembled from expressed 

10 sequence tags (ESTs) ; the accession numbers of the 
sequences used were: AA024489, AA024794, AA025334, 
AA026396, AA026452, AA026502, AA026503, AA026611, 
AA026723, AA035035, AA035523, AA047591, AA047599, 
AA056452, AA115232, AA115352, AA129022, AA129023, 

15 AA159841, AA160353, AA204926, AA226949, AA227197 and 
D20115. The E. coll sequence is a predicted gene 
(accession 1723866) . 

Mutations in HAM1 increase the sensitivity of 
yeast to the mutagenic compound 6-N-hydroxylaminopurine 

20 (HAP) , but do not increase spontaneous mutation fre- 
quency (Nostov et al. , Yeast 12:17-29 (1996)). HAP is 
an analog of adenine and In vitro experiments suggest 
that the mechanism of HAP mutagenesis is its conversion 
to a deoxynucleoside triphosphate which is incorporated 

25 ambiguously for dATP and dGTP during DNA replication 
(Abdul-Masih and Bessman, J Biol Chem 261 (5) : 2020- 
2026 (1986)). The role of the Hamlp gene product in 
increasing sensitivity to HAP remains unclear. 
Explaining the pleiotropy of mlaA and gro-1 

30 Mutations in mlaA f the bacterial homologue of 

gro-1 , show multiple phenotypes and affect cellular 
growth in complex ways. For example, in Salmonella 
typhimurlum, such mutations result in 1) a decreased 
efficacy of suppression by some suppressor tRNA, 2) a 

35 slowing of ribosomal translation, 3) slow growth under 
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various nutritional conditions, 4) altered regulation 
of several amino acid biosynthetic operons, 5) sensi- 
tivity to chemical oxidants and 6) temperature sensi- 
tivity for aerobic growth (Ericson and Bjork, J. Bacte- 
5 riol. 166: 1013-1021 (1986); Blum, J. Bacterid. 170: 
5125-5133 (1988)). Thus, MiaAp appears to be important 
in the regulation of multiple parallel processes of 
cellular physiology. Although we have not yet explored 
the cellular physiology of gro-1 mutants along the 

10 lines which have been pursued in bacteria, the appar- 
ently central role of miaA is consistent with our find- 
ings that gro-1, and the other genes with a Clk pheno- 
type, regulate many disparate physiological and meta- 
bolic processes in C. elegans (Wong et al . , Genetics 

15 139: 1247-1259 (1995) ; Lakowski and Hekimi, Science 
272: 1010-1013 (1996); Ewbank et al . , Science 275: 980- 
983 (1997) ) . 

In addition to the various phenotypes discussed 
above, miaA mutations increase the frequency of sponta- 

20 neous mutations (Connolly and Winkler, J Bacterid 
173(5): 1711-21 (1991); Connolly and Winkler, J Bacte- 
rid 171: 3233-46 (1989)). As described in the previ- 
ous section we have preliminary evidence that 
gro-1 (e2400) also increases the frequency of 

25 spontaneous mutations in worms. 

How can the alteration in the function of MDAPP 
transferase result in so many distinct phenotypes? 
Bacterial geneticists working with miaA have generally 
suggested that this enzyme and the tRNA modification it 

30 catalyzes have a regulatory function which is mediated 
through attenuation (e.g. Ericson and Bjork, <J. Bacte- 
riol. 166: 1013-1021 (1986)). Attenuation is a phe- 
nomenon by which the transcription of a gene is inter- 
rupted depending on the rate at which ribosomes can 

35 translate the nascent transcript. Ribosomal transla- 
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tion is slowed in miaA mutants, and thus, through an 
effect on attenuation, could affect the expression of 
many genes whose expression is regulated by attenu- 
ation. 

5 gro-2 (e2400) also produces pleiotropic effects 

and, in addition, displays a maternal-effect, suggest- 
ing that it is involved in a regulatory process (Wong 
et al., Genetics 139: 1247-1259 (1995). However, 
attenuation involves the co-transcriptional translation 

10 of nascent transcripts, which is not possible in 
eukaryotic cells were transcription and translation are 
spatially separated by the nuclear membrane. If the 
basis of the pleiotropy in miaA and gro-1 is the same, 
then a mechanism distinct from attenuation has to be 

15 involved. Below we argue that this mechanism could be 
the modification by DMAPP transferase of adenine resi- 
dues in DNA in addition to modification of tRNAs . 
A role for gro-1 in DNA modification? 

We observed that gro-1 can be rescued by a 

20 maternal effect, so that adult worms homozygous for the 
mutation, but issued from mother carrying one wild type 
copy of the gene display a wild type phenotype, in 
spite of the fact that such adults are up to 1000 fold 
larger than the egg produced by their mother. It is 

25 unlikely that enough wild type product can be deposited 
by the mother in the egg to rescue a adult which is 
1000 times larger. This observation suggests therefore 
that gro-1 can induce an epigenetic state which is not 
altered by subsequent somatic growth. One of the best 

30 documented epigenetic mechanisms is imprinting in mam- 
mals (Lalande, Annu Rev Genet 30: 173-196 (1996)) which 
is believed to rely on the differential methylation of 
genes (Laird and Jaenisch, Annu Rev Genet 30: 441-4 64; 
Klein and Costa, Mutat Res 386: 103-105 (1997)). Modi- 

35 fication of bases in DNA have also been linked to regu- 
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lation of gene expression in the protozoan Trypanosoma 
brucei. The presence of beta-D-glucosyl-hydroxy~ 

methyluracil in the long telomeric repeats of T. brucei 
correlates with the repression of surface antigen gene 
5 expression (Gommers-Ampt et al . , Cell 75: 112-1136 
(1993); van Leeuwen et al . , Nucleic Acids Res 24; 
2476-2482 (1996) ) . 

gro-1 and mlaA increase the rate of spontaneous 
mutations, which is generally suggestive of a role in 

10 DNA metabolism, and can be related to the observation 
that methylation is linked to spontaneous mutagenesis, 
genome instability, and cancer (Jones and Gonzalgo, 
Proc. Natl. Acad. Sci . USA, 94: 2103-2105 (1997)). 

Does gro~l have access to DNA? Studies with 

15 mod5, the yeast homologue of gro-1, have shown that 
there are two forms of Mod5p, one is localized to the 
nucleus as well as to the cytoplasm, and the other form 
is localized to the mitochondria as well as the 
cytoplasm (Boguta et a J . , Mol . Cell. Biol. 14: 2298- 

20 2306 (1994)). The nuclear localization is striking as 
isopentenylation of nuclear-encoded tRNA is believed to 
occur exclusively in the cytoplasm (reviewed in Boguta 
et al. f Mol. Cell. Biol. 14: 2298-2306 (1994)). 
Furthermore, studies of a gene mafl have shown that 

25 when mod5 is mislocalized to the nucleus, the 
efficiency of certain suppressor tRNA is decreased, an 
effect known to be linked to the absence of the tRNA 
modification (Murawski et al. r Acta Biochim. Pol. 41: 
441-448 (1994)). Finally, as described in the previous 

30 section, gro-1 contains a zinc finger, a nuclei acid 
binding motif. The zinc finger could bind tRNAs, but 
as it is in the C-terminal domain of gro-1 and human 
hgro-1 that has no equivalent in miaA, it is clearly 
not necessary for the basic enzymatic function. We 

35 speculate that it might be necessary to increase the 
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specificity of DNA binding in the large metazoan 
genome. It should also be noticed that the second form 
of Mod5p which is localized to mitochondria also has 
the opportunity to bind and possibly modify DNA as it 
5 has access to the mitochondrial genome. See the 
previous section entitled W A role for gro-1 in a 
central mechanism of physiological coordination" for an 
alternative possibility as to the function of GRO-1 in 
the nucleus. 

10 miaA and gro-1 are found in complex operons 

We have found that gro-1 is part of a complex 
operon of five genes (Fig. 4) . It is believed that 
genes are regulated coordinately by single promoters 
when they participate in a common function (Spieth et 

15 al., Cell 73: 521-532 (1993)). In some cases, this is 
well documented. For example, the proteins LIN-15A and 
LIN-15B which are both required for vulva formation in 
C. elegans, are unrelated products from two genes tran- 
scribed in a common operon (Huang et al . , Mol Biol Cell 

20 5(4): 395-411 (1994)). One of the genes in the gro-1 
promoter is hap-1, whose yeast homologue has been shown 
to be involved in the control of mutagenesis (Nostov et 
al., Yeast 12: 17-29 (1996)). Under the hypothesis 
that gro-1 modifies DNA, it suggest an involvement of 

25 hap-1 in this or similar processes. The presence in 
the same operon also suggest that all five genes might 
collaborate in a common function. The phenotype of 
gro-1 suggests that this function is regulatory. In 
this context, it should be noted that miaA also is part 

30 of a particularly complex operon (Tsui and Winkler, 
Biochimie 76: 1168-1177 (1994)), although, except for 
miaA/ gro-1 , there are no other homologous genes in the 
two operons . 
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A role for gro-1 in a central mechanism of physiologi- 
cal coordination 

We have speculated that the genes with a Clk 

phenotype might participate in a central mechanism of 

5 physiological coordination, probably including the 

regulation of energy metabolism. clk-1 encodes a 

mitochondrial protein (unpublished observations) , and 

its homologue in yeast has also been shown to be 

mitochondrial (Jonassen, T (1998) Journal of Biological 

10 Chemistry 273:3351-3357). The yeast clk-1 homologue is 
involved in the regulation of the biosynthesis of 
ubiquinone (Marbois, B.N. and Clarke, C.F. (1996) 
Journal of Biological Chemistry 271:2995-3004) . 
Ubiquinone, also called coenzyme Q, is central to the 

15 production of ATP in mitochondria. In worms, however, 
we have found that clk-1 is not strictly required for 
respiration. How might gro-1 fit into this picture? 

One link is that dimethylallyldiphosphate is 
known to be the precursor of the lipid side-chain of 

20 ubiquinone. In bacteria, ubiquinone is the major lipid 
made from DMAPP . In eukaryotes cholesterol and its 
derivatives are also made from DMAPP. Interestingly, 
C. elegans requires cholesterol in the growth medium 
for optimal growth. This link, however, remains tenu- 

25 ous, in particular in the absence of an understanding 
of the biochemical function of CLK-1. 

In several bacteria, the adenosine modification 
carried out by DMAPP transferase is only the first step 
in a series of further modification of this base 

30 (Persson et al . , Biochimie 76: 1152-1160 (1994)). 
These additional modifications have been proposed to 
play the role of a sensor for the metabolic state of 
the cell (Buck and Ames, Cell 36: 523-531 (1984); 
Persson and Bjork, J. Bacteriol. 175: 7776-7785 

35 (1993)). For example, one of the subsequent steps, the 
synthesis of 2-methylthio-cis-ribozeatin is carried 
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out by a hydroxylase encoded by the gene miaE. When 
the cells lack miaE they become incapable of using 
intermediates of the citric acid cycle such as fumarate 
and malate as the sole carbon source. 
5 Another link to energy metabolism springs from 

the recent biochemical observations of Winkler and co- 
workers using purified DMAPP transferase (E. coli 
MiaAp) (Leung et al . r J Biol Chem 272: 13073-13083 
(1997)). These investigators observed that the enzyme 

10 in competitively inhibited by phosphate nucleotides 
such as ATP or GTP. Furthermore, using their estimation 
of K m of the enzyme and its concentration in the cell, 
they calculate that the level of inhibition of the 
enzyme in vivo, would exactly allow the enzyme to mod- 

15 ify all tRNAs but any further inhibition would leave 
unmodified tRNAs, This suggests that the exact level 
of modification of tRNA (or of DNA) could be exqui- 
sitely sensitive to the level of phosphate nucleotides* 
Superficially, this is consistent with the phenotypic 

20 observations. The state of mutant cells which lack 
DMAPP transferase entirely would be equivalent of cells 
where very high levels of ATP would completely inhibit 
the enzyme. Such cells might therefore turn down the 
ATP generating processes in response to the signal pro- 

25 vided by undermodif ied tRNAs (or DNA) . 

More generally, GRO-1 could act in the crosstalk 
between nuclear and mitochondrial genomes. The nuclear 
and mitochondrial genomes both contribute gene products 
to the mitochondrion energy-producing machinery and 

30 these physically separate genomes must therefore 
exchange information somehow to coordinate their 
contributions (reviewed in Poyton, R.O. and McEwen J.E. 
(1996) Annu. Rev. Biochem. 65:563-607). Furthermore, 
the energy producing activity of the mitochondria is 

35 essential to the rest of the cell, and the needs of a 
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particular cell at a particular time must be somehow 
convey to the organelle to regulate its activity. GRO-1 
could participate in this coordination in the following 
manner. GRO-1 is found in three compartments, the 
5 nucleus, the cytoplasm and the mitochondria (see 
above) , and thus has the opportunity to regulate gene 
expression in more that one way. How could its action 
coordinate gene expression between compartment? GRO-1 
could partition between the mitochondria and the 

10 nucleus and its relative distribution could be 
determined by the amount of RNA (or mtDNA) in the 
mitochondria (Parikh, V.S. et al . (1987) Science 
235:576-580). For example, if the cell is rich in 
mitochondria, much GRO-1 will be bound there which 

15 could result in a relative depletion of activity in the 
cytoplasm with regulatory consequences on the 
translation machinery. Binding of GRO-1 in the nucleus 
could have similar consequences and provide information 
about nuclear gene expression to the translation 

2 0 machinery. 

While the invention has been described in con- 
nection with specific embodiments thereof, it will be 
understood that it is capable of further modifications 
and this application is intended to cover any varia- 

25 tions, uses, or adaptations of the invention following, 
in general, the principles of the invention and 
including such departures from the present disclosure 
as come within known or customary practice within the 
art to which the invention pertains and as may be 

30 applied to the essential features hereinbefore set 
forth, and as follows in the scope of the appended 
claims . 
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WHAT IS CLAIMED IS : 

1. A gro-1 gene which has a function at the level 
of cellular physiology involved in developmental rate 
and longevity, wherein gro-1 is located within an 
operon and gro-1 mutants have a longer life and a 
altered cellular metabolism relative to the wild-type, 

2. The gro-1 gene of claim 1, wherein said operon 
has the nucleotide sequence set forth in SEQ ID. NO:l. 

3. The gro-1 gene of claim 1, which codes for a 
GRO-1 protein having the amino acid sequence set forth 
in Figs. 3A-3B (SEQ ID. NO:2). 

4. A gop-1 gene which codes for a GOP-1 protein 
having the amino acid sequence set forth in Figs. 13A- 
13C (SEQ ID. NO: 4) . 

5. A gop-2 gene which codes for a GOP-2 protein 
having the amino acid sequence set forth in Fig. 14 
(SEQ ID. NO: 5) . 

6. A gop-3 gene which codes for a GOP-3 protein 
having the amino acid sequence set forth in Figs. 15A- 
15B (SEQ ID. NO: 6) . 

7. A hap-1 gene which codes for a HAP-1 protein 
having the amino acid sequence set forth in Fig. 16 
(SEQ ID. NO:7) . 

8. The gro-1 gene of claim 1, wherein said gene is 
of human origin and which has the nucleotide sequence 
set forth in Fig. 8 (SEQ ID. NO: 3). 
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9. A GRO-1 protein which has a function at the 
level of cellular physiology involved in developmental 
rate and longevity, wherein said GRO-1 protein is 
encoded by the gene of claim 1 or 2 . 

10. A mutant GRO-1 protein which has the amino acid 
sequence set forth in Fig, 3C. 

11. A GRO-1 protein which has the amino acid 
sequence set forth in Figs. 3A-3B (SEQ ID. NO: 2). 

12. A GOP-1 protein which has the amino acid 
sequence set forth in Figs. 13A-13C (SEQ ID. NO:4). 

13. A GOP-2 protein which has the amino acid 
sequence set forth in Fig. 14 (SEQ ID. NO: 5). 

14. A GOP-3 protein which has the amino acid 
sequence set forth in Figs. 15A-15B (SEQ ID. NO: 6). 

15. A HAP-1 protein which has the amino acid 
sequence set forth in Fig. 16 (SEQ ID. NO:7). 

16. A method for the diagnosis and/or prognosis of 
cancer in a patient, which comprises the steps of: 

a) obtaining a tissue sample from said patient; 

b) analyzing DNA of the obtained tissue sample of 
step a) to determine if the human gro-1 gene is 
altered, wherein alteration of the human gro-1 gene is 
indicative of cancer. 



17. A mouse model of aging and cancer, which com- 

prises a gene knock-out of murine gene homologous to 
gro-1 gene of claims 1 to 3 . 
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18. The use of compounds interfering with enzymatic 
activity of GRO-1 of claim 9, 10 or 11 for enhancing 
longevity of a host. 

19. The use of compounds interfering with enzymatic 
activity of GOP-1 of claim 12 for enhancing longevity 
of a host. 

20. The use of compounds interfering with enzymatic 
activity of GOP-2 of claim 13 for enhancing longevity 
of a host. 

21. The use of compounds interfering with enzymatic 
activity of GOP-3 of claim 14 for enhancing longevity 
of a host. 

22. The use of compounds interfering with enzymatic 
activity of HAP-1 of claim 15 for enhancing longevity 
of a host. 

23. The use of compounds interfering with enzymatic 
activity of GRO-1 of claim 9, 10 or 11 for inhibiting 
tumorous growth. 

24. The use of compounds interfering with enzymatic 
activity of GOP-1 of claim 12 for inhibiting tumorous 
growth . 

25. The use of compounds interfering with enzymatic 
activity of GOP-2 of claim 13 for inhibiting tumorous 
growth. 



26. The use of compounds interfering with enzymatic 

activity of GOP-3 of claim 14 for inhibiting tumorous 
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growth. 

27. The use of compounds interfering with enzymatic 

activity of HAP-1 of claim 15 for inhibiting tumorous 
growth. 



WO 99/10482 



PCT/CA98/00803 



f/32 




H 



«3 



H 



SUBSTITUTE SHEET (RULE 26) 



WO 99/10482 



PCT/CA98/00803 




SUBSTITUTE SHEET (RULE 26) 



WO 99/10482 PCT/CA98/00803 

3/32 



SL2 MIFRKFLNFLKPYKMR 16 

aaaatatcgtcaggaaataataacatttcagatataccctgaactctacagtttATGATATTCAGGMTTTCTGMTTTTCTGAMCCTTACAwS 1394 

T D P I I F V I G C T G T G K S D L G V A I A K K Y G G E V I S V 49 

GAACGGATCCGATTATTTTCGTGATTGGGTGCACTGGMCCGGGAAMGTGATCTTGGAGTGGCMTTGCRMGRMTATGGAGGAGR GGTGATTAGTGT 1494 

1 SHP109 

D S M Q F Y K G L D I A T S K I T 66 

AGATTCAATGCMTTTTATAAAGgtacatgggttttgtttcaattttaaattaattaattttcgtttttcagGACTTGACATTGCCACGAATAAGATAAC 1594 



EEESEGIQHHMMSFLNPSESSSYNVHSFREVTL 99 

GGAAGMGMTCTGAAGGGATTCMCATCATATGATGTCATTTTTGMTCCATCTGAATCATCATCTTATAATGTACATAG TTTCCGAGMGTCACGTTG 1694 

SHP94 T 

D L I K KIRARSKIPVIVG 116 

GATCTTATTAMgtgcttaattcgccactttttgaacttgatcctaattttcataattttcagAAAATCCGCGCCCGTTC^^ 1794 

* SHP95 

GTTYYAESVLYENNLIETNTSDDVDSKSRTSSE 149 

GAGGMCCACTTATTATGCTGAAAGTGTCCTTTATGAGAATAATCTGATTGAAACCAACA CTTCAGATGACGTGGATTCCA AATCGAGAACATCATCAGA 1894 

8HP96 T 

S S S E D T E E G I S N Q E L K D E I K K I D E K S A L L L H P N 182 

ATCGTOTCTGMGACACTGMGMGGMTTAGTMTfMGMTTATGGGATGMTTGAAAAAM 1994 
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p-l continued,,, 

tTRYRVQRALQIFRETG 198 
AATCGTTATCGAGTACAGAGAGCATTGCAAATTTTCAGAGAAACTGgtaattgatttgcaaatttccagattaaaaacaaatcaagtaaagttttttgca 2094 



IRKSELVEKQKSDETVDLGGRLRFDNSLVIFMD 231 

gGMTCCGAAAMGTGAACTTGTTGAAAAACAGAAATCAGATGAAACTGTTGATTTGGGTGGACGACTACGATTTGATAATTCTTTAGTTATTTTTATGG 2194 

ATPEVLEERLDGRVDKMIKLGLKNELIEFYNE 263 

ATGCAACACCTGAAGTTTTAGAAGAAAGACTTGATGGAAGAGTTGATAAAATGATTAAATTGGGTTTGAAGAATGAATTGATCGAGTTTTATAACGAGgt 2294 



aaatatttgaatttttccagaaaaaaaaagaaaattttttattattttgtttttttttcattctttactattttccaaaaaagtttaaacttttgaaaac 2394 



H A E Y 267 

tgttcagaaaatgttcgtgtatttattttagcttactgaggcattatttcattgtgatttttactatactctataaactaaattttcagCACGCCGAGTA 2494 



INHSKYGVMQCIGLKEFVPWLNLDPSERDTLNG 300 

CATAAATCACAGCAAATATGGTGTCATGCAATGTATTGGTCTTAAAGAATTCGTTCCATGGCTCAATTTGGACCCATCAG AAAGAGATACACTCAATGGG 2594 

^-CG^oOlesion ? 

D K L F K Q G CDDVKLHTRQY 318 

GATAAATTGTTCAAGCAAGGgtaatttaaatttattttcaatttttataaattccaagctattttcagATGCGATGATGTGAAGCTTCACACTCGACAAT 2694 
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ARRQRRWYRSRLLKRSDGDR 33 

ATGCACGGCGCCAGAGACGGTGGTATCGATCGAGACTTTTAAAACGGTCGGATGGTGATCGGgtatgttgattttaaaaaaattgaatttttaaagaact 279 

tttttactaaattaacaaagttattggctgaaaatggctgaaaattatagtaaaactaatcaaaaaaattgaaattttgaattaaagtcataaagtgacg 289 

KMASTKMLD 34 

accagaaaattaaaaaaaaacatttttctattttaattaattcactctacttcactttaaaaataattttcagAAAATGGCAAGTACAAAARTGCTGGAT 299 



T S D K Y R I I S D G H D I V D Q W M 8 G I D L F E D 37 
ACATCTGACAAGTACCGMTAATTAGTGATGGMTGGACATTGTTGATCMTGGATGAATGGAATCGATCTATTTGAAGATgtaaaatttcacaaattct 309 



ISTDTNPILKGSDANILLNCEI 39 
aaaatttccgaatcacaaattaaaatttctacagATCTCCACAGACACCAATCCAATTCTAAAAGGGTCCGATGCAAATATTCTGCTGAATTGTGAAATC 319 

CNISMTGKDNW QKEIDGKK 41 

T(MATTTCAATGACTGGAAMGATMTGgtttgtto 329 
SHP110 T T 3HP100 

HKHHAKQKKLAETRT* 43 

GCACAAGCATCATGCTAAGCAAAAGAAATTGGCAGAGACTCGCACAtaagacgctatatttattttttgttaacttaaattatttttgttgttgattgtt 339 

polyA 

ctctaaataaaaaaacagctcagagagaa'g^Waggcgctcgtccacatctccgacgatagtcaacccgaacgaagggaactatctttaattgtcagtga 349 

* SHP92 
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tgatttttactatactctataaactaaattttcagCACGCCGAGTACATAAATCACAGCAAATATGGTGTCACG 1197 

HAEYINHSKYGVT 276 

TTGGTCTTAAAGAATTCGTCCATGGCTCAATTOA 1272 

IiVLKNSFHGSIWTHQKWIHSMGINC 301 

TCAAGCAAGGgtaatttaaatttattttcaatttttataaattccaagctattttcagATGCGATCATGtgaagcttc 1350 

8 8 I D A M K • 308 
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Sequence of GRO-1 and homologies 



I I I I I I I Ml III I 

C.elegans i MIFRKFLNFLKPYKMRTDPIIFVIGCT6T6KSDLGVAIAKKYGGEVISVDSMQFYK6LDIATNKITEEESEGIQ 
S.cerevisiae 1 MLKGPLKGCLNMSKKVIVIAGTTGVGKSQLSIQLAQKFNGEVINSDSMQVYKDIPIITNKHPLQEREGIP 
E.coJi i MSDISKASLPKAIFLMGPTASGKTALAIELRKILPVELISVDSALIYKGMDIGTAKPNAEELLAAP 

ATP/GTP 

binding site 



C.elegans ft HHMSFLNPSESSSYNVHSFREVTLDLIKKIRARSKIPVIVGGTTYYAESVLYENNLIETNTSDDVDSKSRTSSE 
S.cerevisiae 12 HVMNHVDWSE--EYYSHRFETECMNAIEDIHRRGKIPIWGGTHYYLQTLFNKRVDTKSSERKLTRKQLDILES 
E.coli 68 RLLDIRDPSQ--AYSAADFRRDALAEMADITAAGRIPLLVGGTMLYFKALLEGLSPLPSADPEVRARIEQQAAE 



I I l III III II * 

C.elegans m SSEDTEEGISNQELWDELKKIDEKSALLLHPNNRYRVQRALQIFRETGIRKSELVEKQKSDETVDLGGRLRFDN 

S.cerevisiae i« DPDV IYNTLVKCDPDIATKYHPNDYRRVQRMLEIYYKTGRKPSETFNEQK ITLKFD- 

I.coli 143 GWES LHRQLQEVDPVAAARIHPNDPQRLSRALEVFFISGKTLTELTQTSG DALPYQV 
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m 

i t « i • • • 

C.elegans 226 LVIFMDATPEVLEERLDGRVDKMIKLGLKNELIEFYNEHAEYINHSKYGVMQCIGLKEFVPWLNLDPSERDTLN 
S.ceravisiae 205 LFLWLYSKPEPLFQRLDDRVDDMLERGALQEIKQLYEYYSQNKFTPEQCENGVWQVIGFKEFLPWLTGKTDDNT 

B. coli 202 QFAIAPASRELLHQRIEQRFHQHLASGFEAEVRALFARGDLHTDLPSIRCVGYRQHWSYLEGEISYDEMVYRGV 

III I M I ( 

C. elegans » DKLFKQGCDDVKLHTRQYARRQRRWYRSRLLRRSDGDRKMASTKMLDTSDKYRIISDGMDIVDQWMNGIDLFED 
s'.cererisiae 280 KLEDC I ERMKT - - RTRQYAKRQVKWIKKMLIPDIK6DILLDATDLSQWDTNASQRAIA ISHDF ISNRPIKQERA 
!(C0li m — - ATRQLARRQITWLRGWEGVHWLDSEKPEQARDEVLQVVGAIAG 

, C2H2 zinc finger . 

Celegana m ^TnTWPIT.KG^DANILLN CEICNISMTGKDNWOKHIDGKKHK HHAKQKKLATRT 

S.cerewsiae 353 KALEELLSKGETTMKKLDDWTHYTRNVCRNADGKNVVAIGEKYWKIHLGSRRBKSNLKRNTRQADFEKWKINKK 
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Sequence of HAP-1 and its homolocnies 



•Mill * 

H. Sapiens MAASLVGKKIVFVTGNAKKLEEWQILGDKFP CTLVAQKIDLPEYXG- EPDEI SIQKCQE 

C, elegaDS MLYILWKLKYLQKKMSLRKINFVTGNVKKLEEVKAILKNFE VSNVDVDLDEFQG-EPEFIAERKCRE 

s. cerevisiae msnneivpvtgnanklkzvqsiltqevdnnnktihlinealdleelqdtdlnaialakgkq 

£, coll MQKWLATGNVGKVRELASLL SDFGLD IVAQTDLGVDSAEETGLTFIENAILKA 



III I I I • • 

H. sapiens AVRQV-QO-PVLVEDTCLCFNALGXLPGPYIKHFL--EKLKPEGLHQLLAGFED KSAYALCTFALSTGDP 

C, elegans AVEAV-KG-PVLVEDTSLCFNAMGGLPGPYIKWFL--KNLKPEGLHNHLAGFSD KTAYAQCIFAYTEG-L 

S, cerevisiae AVAALGKGKPVFVEDTALRFDEFNGLPGAYIKWFL--KSMGLEKIVKMLEPFEN KNAEAVTT ICF AD SRG 

£, COii RHAAKVTALPAIADDSGLAVDVLGGAPGIYSARYSGEDATDQKNLQKLLETMKDVPDDQRQARFHCVLVYLRHAE 



I II I Mil 



II I Ml • 



h. sapiens sqpvrlfrgrtsgriv-aprgcqdfgwdpcfqp-dgyeqtyaehpkaeknavshrfrallelqeyfgslaa 

C, elegans GKPIHVFAGKCPGQIV-APRGDTAFGWDPCFQP-DGFKETFGEBDKDVKNEISHRAKALELLKEYFQNN 

S, cerevisiae E — YHFFQGITRGKIV-PSR6PTTFGWDSIFEPFDSHGLTYAEHSKDAKNAISHRGKAFAQFKEYLYQNDF 

£• COli DPTPLVCH6SWPGVITREPAGTGGFGYDPIFFV-PSEGKTAAELTREEKSAISHRGQALKLLLDALRNG 
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mRNA sequence of human homologue of gro-1: hgro-1 



CTGCCATAAG 


ATGGCG1 bbG 


1 bbbbbb 1 bb 


AUbiAbrUAbjl 1 


bb Ibl bbbbA 


GTGGGCTCAG 


GGGCCTGCAA 


CGGACCC TAG 


blbl IbrlAbl 


bAl Ibl bbbb 


GCCACGGGCA 


/-** s~+ 1 — - /— < t\ t\ tv m 

CCGGCAAATC 


CACGCTGGCG 


1 1 brLAbjC I Abr 


r~* r~* i\ r~* r~* r~* <~n 
bbbAbbbbb 1 


CGGCGGTGAG 


t\ rn r~* r~* rn /-h tv r~* r~* 

ATCGTCAGCG 


C 1 bAL 1 LLA 1 


brbAbibj 1L1A1 


bAAbbbb 1 Ab 


AC AT CATC AC 


« t\ 7\ z'"* 7\ TV /""• rp m 

CAALAAbb 1 1 




Abrb AbiAbrAA 1 


b 1 bbbbbbAb 


CACATGATCA 


Gb I I I G 1 bbA 


rn rn /*^ m rn z^ 1 rn 

1 Lb Ibl 1 br 1 bi 


AbbAAl lAbA 


b Ab 1 bb 1 bbA 


CTTCAGAAAT 


TV r~~* TV Z""' TV TV z - ^ 1 rn 

AGAGCAACTG 


rn f-+ rn 7\ rnrn/-*7\ 

blbl brAl 1 biA 


7\ t\ rn tv rn t\ rn rn rp 

AbAl Al Al 1 1 


bbbbbAbAbA 


AAATTCCTAT 


TGTTGTGGGA 


s—i /—i TV TV r~* TV TV m rn 

GGAACCAATT 


T\ rn m t\ o T\ rp rp f~~* T\ 

All AC A I IGA 


7\ rp p rpp rp /--< rn 

Al b 1 b 1 bb 1 b 


m y™t ~n ~t\ it til rn /"""i 

TGGAAAGTTC 


m m f~~* m tv tv m tv 

TTGTCAATAC 


CAAGCCCCAG 


^tv r~< t\ rrif-'/~'/~»/^*7\ 
brAbiAl brGGCA 


rr\ r~* -i\ c i\ tv tv rp 

b 1 bAbAAAb 1 


GATTGACCGA 


TV T\ TV / — • m / — 1 / — « TV / — •« 

AAA G T G GAG C 


m rn tv t\ tv tv / — ' r~~* tv 

T T G A A AAG G A 


/~* tv m r~* m rp m 

GGA1 GG ibl 1 


rp Tv /"^mrpoTv r^l\ 

b 1 Ab 1 1 b Ab A 


AACGCCTAAG 


/ — « t\ / — ' /*~» rn / — * /~* tv 

CCAGGTGGAC 


/""•* TV TV TV 7\ m / — ' / — ' 

CCAGAAATGG 


rp f~~> r** tv tv r~* r~* rp 

CTGCCAAGbT 


7\ rp /^t t\ /—i tv rp 

GCATCCACAT 


GACAAACGCA 


AAGTGGCCAG 


^— i -tv /-^ m m /~* t\ tv 

GAGCTTGCAA 


GTTTTTGAAG 


tv tv tv tv r~* TV TV rp 

AAACAGGAAT 


CTCTCATAGT 


s — % T\ ~Tl f I I 111 Ml tfl /—^i 

GAATTTCTCC 


TV m / — ^ / — i m / — 1 TV TV f — ' TV 

ATCGTCAACA 


m TV f — 1 /~* TV TV /~* TV Tv 

TACGGAAGAA 


GGTGGTGGTC 


CCCTTGGAGG 


TCCTCTGAAG 


m m /~i m m tv tv r~* 

TTCTCTAACC 


/ i rn m tv m ^ rp 

CTTGCATCCT 


rn m /*^* f~* m m / ^ tv m 

TTGGCTTCAT 


GCTGACCAGG 


CAGTTCTAGA 


TGAGCGCTTG 


/ — i -t\ f 1 1 TV TV /*""« TV s~* / — 

GATAAGAGGG 


TGGATGACAT 


GCTTGCTGCT 


GGGCTCTTGG 


tv /~* tv TV r~* rn tv tv /~* 

AGGAACTAAG 


7\ /"^ TV rn rp rp rp TV 

AGAI III LAL 


AbALbL I A1A 


ATCAGAAGAA 


TGTTTCGGAA 


AATAGCCAGG 


ACTATCAACA 


TGGTATCTTC 


CAATCAATTG 


GCTTCAAGGA 


ATTTCACGAG 


TACCTGATCA 


CTGAGGGAAA 


ATGCACACTG 


GAGACTAGTA 


ACCAGCTTCT 


AAAGAAAGGA 


CCTGGTCCCA 


TTGTCCCCCC 


TGTCTATGGC 


TTAGAGGTAT 


CTGATGTCTC 


G A AG T G G GAG 


GAGTCTGTTC 


TTGAACCTGC 


TCTTGAAATC 


GTGCAAAGTT 


TCATCCAGGG 


CCACAAGCCT 


ACAGCCACTC 


CAATAAAGAT 


GCCATACAAT 


G AAG C T GAGA 


ACAAGAGAAG 


TTATCACCTG 


TGTGACCTCT 


GTGATCGAAT 


CATCATTGGG 


GATCGCGAAT 


GGGCAGCGCA 


CATAAAATCC 


AAATCCCACT 


TGAACCAACT 


GAAGAAAAGA 


AGAAGATTGG 


ACTCAGATGC 


TGTCAACACC 


AT AG AA AG T C 


AGAGTGTTTC 


CCCAGACTAT 


AACAAAGAAC 


CTAAAGGGAA 


GGGATCCCCA 



GGGCAGAATG ATCAAGAGCT GAAATGCAGC G T T T AA G AG A CATGTCCAGT 
GGCCTTTGGA AAGGTGGTGG GGATCCAGTT CAGGAGGGAG GGGTATGTTT 
GTCTCCCAGT CTGGGCAAAG GAGTGCTATG CGGAATTCTC TGCATAGCAG 
AAAAGCTCCC ACCATTTTCT TTTGATGTGG TTTTAAAGTC TCACGTTCTC 
TATAATAGAA ACAGCAGGTC TTGTCAGCTC CTTGTGTGGC TGATGTGTCT 
GGAAATGATG TAGTTCAGGA AAGCATTTTT TTTTTCTTTG AACCTTAAAG 
GTTCTATTAT TAAAAGCAGC ACAGATTCCA CATTTTTATA CAT G AGG AT C 
TTCTTTGTGG TGAATACCAG GATTGACTGC ATCCCTTTAA AAGAAGTTTT 
ATGTCCCTGA CTCTGGCTAA AATTATCTAA TTTCCAGATG CTTTTGTAGA 
TGACTGAAGT ATTTGTGAGC CACATATTGG GAGTTCTAGA TTTGAGTGAA 
TGGCAGGAAA GGGCCATCTC CAT T GAG AT G ATTAAGTGAA CCAAACTAGT 
TCTCGGAATT CTACAGAGAA GGAGGGAATC AG AC T G AGG A AGCTGTGACA 
TAGGACTTGA AGACCAAAGA CTTTGAAATT TGCGAGCTGC TCATGTGTGA 
GTTATTATCA CTGCTGTCTT TCTATTGAGT TACAAATCTA TATTTTTATT 
GAAGTTTAAA TAAAGAAAAA ATTTACAAGA AAAAAAAAAA A 
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MIFRKIOTLKPYKra 



I I Ml M I II I I II I II Ml MM II II I 

nTEEESEGIQHHMSFOTSESSSYMSFREVTLDLiro 



I II I I I I • II I I M I MM 

TKPQEMGTEMDRMjEKEDGLV W^mmmmmSfmSSL 

etoddvdsksrtssessseim 
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II % % Mill I I I MM IM I M M I 

hgro-lp SElOTQHTEEOTffljOT 



I I I M I M I • • 

hgro-lp ^^^^^M^MW^M^WIMI 

GRD4 YIMISKY--(MQCIGIi(EFVPmDPSERDra 



M I 



hgro-lp VSDVSKWEESVLEPALEIVQSFI PKPTATPIKlYlAENKRSMr 

(30-1 RSDGDMASW^ 



II I I I I I l I 



hgro-lp CDmillGDRlMKSKSTO^ 

C2H2 zinc finger 
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Structure of pMQ8 



Sad 

gtacgtg(gagctc)- 



SHP161 



atcgtgttccaggtgcjaactatatattgagcaggaggacgagttgtttgtttcatgctgcttaaaaataaaaatg 

, J SHP151 * 
cagcgagclgca- 

Sphl 

gaaaattgagtcaaaaagttgagataaaacaaattaaaacaattttctgaaaaataaacaactgaaatttgaagtaataaacaacacgcgaaaacgttat 



ttcggagcatcgtttgagaagtaaaactttttttcggcgcacccttgtgcgcagtttttatcttctcttttaatttaattttcaagctaaatctttcttt 



promoter 



ttaaactttq 



SHP160 



gro-1 



SHP159 



IFRKFLNFLKPYKMR 



^aataaatatttaaatattcag atataccctgaactctacagtttATGATAHCAGGAMTTTCTGMTTTTCTGAAACCTTACAAAATGC 



T D P I I F V I G C T G T G K S D L G V A I A K K Y G G E V I S V 
GAACGGATCCGATTATTTTCGTGATTGGGTGCACTGGAACCGGGAAAAGTGATCTTGGAGTGGCAATTGCAAAGAMTATGGAGGAGAGGTGATTAGTGT 
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D S M Q F y K G L D I I T N . , . 
AGATTCAATGCAATTTTATAMGgtacatgggttttgtttcaattttaaattaattaattttcgtttttcagGACTTGACATTGCCACGAAT 



\ I Q R K L A E T R T • 

taagacgctatatttattttttgttaacttaaattatttttgttgttgattgtt 

^(tctaga)tatact 
Xbal 



; . CATGCTAAGCAAAAGA AATTGGCAGAGACTCGC ACA1 

SHP170 



ctctaaataaaaaaacagctcagagagaagattaggcgctcgtccacatctccgacgatagtcaacccgaacgaagggaactatctttaattgtcagtga 

* SHP162 I 

^■(ctgcag)tgtcat 

PstI 



-h*=r - 7 7ft 
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Construction of pMQ1 8 



SHP151 



SHP170 



promoter 
Sphl 



a a a /\ a r SHP151SHP170 

A == /N=/t/\zAz/h PCR product amplified 
gro-1 from pMQ8 




1 kb 



pPD95.77 



gfp unc-54 3 UTR 



-[ 



gro-1 gfp 200 aa 



GRO-1 ::GFP Fusion Protein 
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atcgtgttccaggtgcaactatatattgaqcaggaggacgagttgtttgtttcatgctgcttaaaaataaaaatggaaaattgagtcaaaaagttgagat -9551 

aaaacaaattaaaacaattttctgaaaaataaacaactgaaatttgaagtaataaacaacacgcgaaaacgttatttcggagcatcgtttgagaagtaaa -9451 

actttttttcggcgcacccttgtgcgcagtttttatcttctcttttaatttaattttcaagctaaatctttctttttaaactttgaataaatatttaaat -9351 

MFRKLGSSGSLWKPKNPHSLE 21 

attcagaatgcaccaataaacctggaacaaaatcgata ATGTTCCGCMGCTTGGTTCT TCTGGGTCACTATGGMGCCGAAAAATCCGCATTCTTTGGR -9251 

Y L K Y L Q G V L T K N E K V T E N N K K I L V E A L R A I A E I 54 

ATACCTCMTATTTACAAGGAGTGCTCACAAAM^ -9151 

LIWGDQHDASVFD F F L E R 12 

CTCATTTGGGGCGATCAGAATGATGCTTCGGTTTTTGAgtgagtttttttccaatgttttttttcaaatctgatgttgaatttcagTTTCTTCCTTGAGC -9051 

QMLLYFLKIMEQGNTPLNVQLLQTLNILFENIR 105 

GGCAMTGCTTCTTTATTTCTTGAAMTTATGGMCAAGGAMCACA CCACTAAATGTACMTTACT GCAGACTTTGMCATTTTATTCGAAAATATTCG -8951 

T SHP171 

H E T S L Y FLLSHHHVHSII 123 

ACATGAAACTTCACTTTgtaagttttttatatggattttcgcttaaaattgccagttttcagATTTCCTTCTMGTMCAATCATGTAAACTCGATTATT -8851 

SHKFDLQNDEIMAYYISFLKTLSFKLNPATIHFF 151 

TCCCACAMTTCGATTTACAAAATGATGAGATCATGGCTTACTACATTAGTTTTCTGAAAACTCTTTCATTTAAACTGAATCCAGCTACAATCCACTTCT -8157 
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F N E T T E E F P L L V E V L K L Y H W N E S H V R I A V R N I L 190 

TCnCftATGAAACGRCTGRAGMTTT CCATTGTTGGTAGARGTTTT GRAGCTTTATMTTGGMTGMTCAATGGTTCGAATTGCTGTTAGAMTATTCT -8657 
^ SHM41 SHP172 f 

L N I V R V Q D D S M I I F A I R H T K 210 

TTTAAATATTGTGAGAGTTCAAGATGATTCAATGATTATTTTCGCTATCAAGCATACAAAAgttagtagaaaattattttgaaaaggtgtatttaagcaa -8551 

EYLSELIDSLVGLSLEMDTFVRSAENVLAN 240 

taaatattacagGAATATCTATCGGAGTTAATAGATTCTCTAGTTGGTCTCTCACTTGAAATGGACACATTTGTACGATCTGCTGAGAATGTGTTAGCTA -8457 



R E R L R G K V D D L I D L I H Y I G E L L D V E A V A E S L S I 273 

ATCGAGAGAGATTACGA GGAAAAGTGGATGATTTAATT GATTTGATTCATTATATTGGT GAACTATTGGATGTGGAAGCTGTCGCCGAAAGTTTATCAAT -8357 

SHP142 SHP173 * 

L v TTRYLSPLLLSSISPR 291 

TTTAGgtcagttttactgctggaaaatcaagtttttaatgttaaattttcagTAACAACACGATACTTAAGCCCTCTATTACTTTCAAGTATATCACCAA -8257 



RDNHSLLLTPISALFFFSEFLL 313 

GAAGAGATAATCATTCACTTCTACTCACTCCGATTTCTGCGTTATTTTTTTTCTCTGAATTTTTATTGgtgagttttaacatttaaaattacatttttct -8157 

IVRHHET1YTFLSSFLFDTQNTLTTHKI 341 

aatttatttatttttcagATAGTTCGTCACCATGAAACAATATATACATTTTTATCATCTTTCCTATTTGACACTCAGAATACTTTGACGACCCATTGGA -8057 

RHNEKYCLEPITLSSPTGEYVNEDH 366 

TACGTCATAATGAGAAATATTGCTTAGAACCGATTACATTATCATCACCAACCGGAGAATATGTGAATGAAGACCAgtaagagctgaaattttaaaattt -7957 

VFFDFLLEAFDSSQADDSKAFYGLM 391 

ttgctttgaatatagtattttcagCGTATTTTTCGATTTTCTACTGGAAGCATTTGATTCCAGTCAAGCAGACGATTCGAAGGCATTCTATGGATTAATG -7857 
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jop-i continued,,, 

LIYSMFQNNA 401 
CTGATTTATTCAATGTTTCAGMTAATGgtgaqttttaaaaaattgatttgttaaattaaaatttccatttccaataactcctcttcagacagtaagttt -7157 

tcaatgttgtaaagttcctgttcatctgtgatcgttttcttcatttttttagttttgcatgaacagttttcaaatttttttgatatcatacagtaaatat -1657 

cgtcatccagataattttctatttaaaaaaaatgaataaaaagagggcgcgcagaaattgccgaagtaatgtaaatttaaagggacacatgcgtagcttg -7557 

ttgtgtgggtctcgccgcgctttgtttgatttatcttgttttctgctcaaagagctgtttttattttagcgttgaatgcttttttaccgttctcatcggc -7457 

tttttaataggaatatttaaaaaaaaaggtttaataaatcttcgtttttacaaaatccatctaagatttgcatttgtgaagctcaacaagtaaagtttta -7357 

agtaacattgttttttaaaaaacaattgaaccaaattttgccgaaacattaataacatgacgatactctataaaatattcctcttttcaaaataaatttt -7257 

DVGELLSAANFPVLKESTTTSLAQQN 427 
caaaaaaaatccatttttcagCCGATGTTGGAGMCTTCTATCTGCTGCCMCTTC C^CAGTGCTCAAAGAATCAACG ACAACTTCATTAGCTCAACAGAA -7157 

T SHP174 

L A R L R I A S T S S I S K R T R A I T E I G V E A T E E D E I F 480 

TCTTGCTCGTCTCCGAATAGCATCTACGTCTTCCATATCAAAGCGAACGAGAGCTATCACTG AAATTGGAGTAGAAGCGACC GAGGAAGATGAGATTTTT -7057 

SHP185 * 

H D V P E E Q T L 469 

CATGATGTTCCTGAAGAACAAACGTTGgtaagtaaataaatcaacattgattgttacacaaactttaatatttttaaatttgaaaattttcttcaaagtg -6957 

EDLVDDVLVDTENSAISDPE 489 
ctcaaaaatcctgtcgaaaattacagGAAGATCTGGTGGATGATGTATTGGTTGATACTGAAAATTCAGCAATAAGTGATCCAGAAgtgagtagaaaacg -6857 

PKNVESESR 498 

tgcatgtattaattattaaaaaaaaaatatagttttccccagttttccttgacctaaaactcagcaatttcagCCTAAAAACGTGGAGTCAGAATCTCGT -6757 
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S R F Q S A V D E L P P P S T S G C D G R L F D A L S S I I K A V G 532 
TCTCGATTTCAATCTGCTGTTGATGAGCTTCCACCTCCGTCGACTTCTGGATGTGATGGTCGACTTTTTGATGCACTTTCATCGATTATCAAAGCAGTTG -6651 

T D D N R I R P 1 T L E L A C L V I R Q I L M T V D D E R 561 
GAACAGATGACAATCG MTTCGACCAATTACATTGGM CTTGCATGTCTTGTAATTCGGCAAATTTTAATGACTGTTGATGATGAAAAAqtaagattaca -6551 

SHP175 * 

VHTSLTKLCFEVRLKLLS 519 
aattcaaaattgagcaaaatcagaatctaaatttcataaattgttcagGTACATACCAGTTTAACGAAATTATGCTTCGAAGTTCGTCTAAAACTTTTAT -6451 

SIGQYVNGENLFLEWFEDEYAEFE 603 
CATCAATTGGACAATATGTTAATGGAGAGAATCTGTTTTTGGAGTGGTTTGAGGATGAATATGCAGAATTTGAAgtaagccaagaggtccgaaaataatt -6351 

V N H V N F D I I G H E M L L P P A A T P L S N L L L 630 
taattcatcctttttattcagGTGAATCACGTGAATTTCGATATAATCGGTCACGAAATGCTTCTTCCTCCAGCTGCAACTCCTCTTTCGAATCTGCTAC -6251 

HKRLPSGFEERIRT Q I V 647 

TTCATAAGCGATTGCCCAGTGGATTTGAAGAACGAATAAGAACTgtaggaaactttttaaatttgaaaattaattatatatatatttgcagCAAATCGTA -6157 

FYLHIRKLERDLTGEGDTELPVRVLNSDQEPVAI 681 
TTCTACCTACATATTCGAAAATTGGAACGAGATTTGACCGGTGAAGGAGACACAGAATTACCTGTGAGAGTGTTGAATTCTGATCAGGAACCAGTTGCCA -6051 

G D C I I L H N S D L L S C T 696 

TCGGTGATTGTAITAATTTACqtgaqttcatctgcatagaaaacaccatatttctactcaaattaacaattttcaqATAATTCGGATCTTCTATCCTGCA -5951 

VVPQQLCSLGKPGDRLARFLVTDRLQLILVEPD 129 
CTGT GGTTCCTCAACAACTATGTTC TCTTGGAAAACCTGGTGATCGTCTTGCTCGATTCCTTGTCACTGATAGACTTCAATTAATTCTTGTCGAACCGGA "5851 

* si™ 

S R K A G « A I V R F V G L L Q D T T I N G D S T D S K V L H V V 162 
TTCTCGAAAAGCCGGATGGGCAATTGTTCGATTCGTAGGACTTCTTCAAGATACAAC AATTAATGGAGATTCTACGGA TTCGAAAGTTTTGCATGTTGTG -5151 

SHP177 T 

V E G Q P S R I K K R H P V I T A 119 

GTGGMGGGCMCTCGAGMTTMGgtaagaatactaacgggaaaaaaaaatcaaaaaattacttctgtttcagAAMGACATCCKTn -5651 
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AFIFDDHIRCMAAKQRLTK 798 
AAGTTCATATTCGATGATCACATTCGGTGTATGGCAGCAMGCMCGGCTCACCMGgtaacggaaaaaataaccaaaaagacggaaagttattgtaaat -5557 



ggacgaaatcggcgaaattaattgaaaacgtttgaatttgccgctaaaaccaaacgaaaaccaaacqaaagcgaaatttaactatcccttcaggtagaat -5457 

G R Q T A R G L K L Q A I C S A L G V P R I D P A T 824 

atacattttatttctctttatagGGTCGCCAAACAGCACGTGGTCTGAAACTTCAGGCGATATGTTCAGCTCTTGGAGTTCCACGTATCGATCCAGCGAC -5357 

MTSSPRMNPFRIVKGCAPGSVRKTVSTSSSSSQ 857 

AATGACGTCATCACCACGAATGAATCCATTCAGAATTGTGAAAGGATGCGCACCGGGAAGTGTACGAAAAACTGTTTCCACATCATCATCGTCAAGCCAA -5257 

GRPGHYSANLRSASRNAGMIPDDPTQPSSSSERR 891 

GGACGTCCCGGACATTATTCTGCAAATCTTAGA TCAGCATCTAGAAATGCAGG AATGATACCAGATGATCCAACTCAACCGAGTAGTTCTTCGGAAAGAA -5157 

SHP178 * 

S . 892 

GATCCtagggatcaatatctcttcagtttcatcattttatgctgtaaattgtatttaagtattcctattctttgtagtactgtatttacacatcgtctag -5057 



ttaaaatcacaaatctccgaaaaaacaaaccagtgaacatgtgatatttctcttgcccatagttctcttttttttttgaaacaaaaacaattacttttat -4957 

(* 

gctcacctattcgagccatatttttttcccaattaccggttgtttattttaatttcttttttttttctgtaaatctactttatttttaaaactgcatttg -4857 



agattgtgtatattttttcaaaatggttcaaatgccgaatctatctactt -4807 
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fr2 



^ MAEKAENLPSSSAEASE 1 

tttaatcattattcaaacagaaaaaccgattatttattcagattctcaaaaATGGCTGAAAAAGCTGAAAATCTTCCATCTTCTTCGGCCGMGCTTCAG -470 

EPSPQTGPNVNQKPSILVLGMAGSGKTTFVQ 4 

AAGAGCCATCACCTCAAACTGGACCAMTGTGAATCAAAAACCATCGATTTTGGTTCTTGGAATGGCTGGTTCTGGAAAAACGACATTTGTTCAGgtaac -460 

RLTAFLHARKTPPYVINLDP 6 

tttcattcaattttgagagttttcaaacattactattttcagCGTCTCACAGCATTCCTACATGCTCGTAAAACACCTCCATATGTGATTAATCTGGATC -450 

A V S K V P Y P V N V D I R D T V K Y K E V M K E F G M G P N G A 10 

CGGCAGTTAGCAMGTACCTTAT CCAGTGMTGTTGACATTCGA GATACTGTGAMTACAAGGMGTTATGAAAGAATTCGGAATGGGACCAAATGGAGC -440 

T SHP179 

IMTCLNLMCTRFDKVIELINKRSSDFSVCLLDT 13 

AATTATGACATG TCTTAACCTGATGTGTACTCG TTTTGATAAAGTAATTGAGTTGATTAATAAGAGATCTTCTGATTTCTCAGTTTGTCTTCTTGATACT -430 

SHP180 ? 

P G Q I E A F T W S A S G S I I T D S L A S S H P T 16 

CCTGGACAAATTGAAGCATTCACTTGGAGTGCTAGTGGATCTATTATCACTGATTCATT GGCAAGTAGCCATCCCACGgt aagggattttgatttatgaa -420 

T SHP143 



atctgcttgaaatgaaaaaagattctaataaatttttgacttttaaacattttttacagttatatttggtctattttctatcattaaaagcaaaatgaaa -410 

VVMYIVDSARATNPTTFMSN 18 
agtcgattctactccatatttattaatttcgacttttcagGTKTMTGTACATTGTGGATTCCGOT^ -400 

? SHP144 
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M L Y A C S I L Y R T K L P F I V V F N K A D I V K P T F A L K W M 21 
ATGCTCTACGCATGTTCCATTCTCTACCGTACCAAACTTCCATTCATTGTCGTTTTCAACARAGCTGATATTGTCAAACCMCATTTGCACTCAAATGGA -390 

QDFERFDEALEDARSSYMNDLSRSLSLVLDEFY 24 
TGCAAGATTTCGAAAGATTTGATGAAGCTTTAGAGGATGCCAGAAGCAGTTATATGAATGATTTGAGTCGTTCATTGAGTCTCGTTCTTGATGAATTCTA -380 

T 

SHP181 

C G L K T V CVSSATGEGFEDV 26 

TTGCGGACTGAAAACAGgtttttattcgaaataaaaccttttttaaataataaatttcagTTTGCGTCAGTTCTGCAACTGGAGAAGGATTCGAAGATGT -370 



M T A I D E S V E A Y K K E Y V P M Y E K V L A E K K L L D E E E 29 
MTGACAGCAATCGATGMAGTGTTGMGCATACAAAAAAGMTATGTTCCMTGTATGAAAMGTGTTGGCTGAGAAAAAACTATTGGATGAGGAGGAG -360 



R K K R D E E TLKGKAVHDLNKV 31 

AGAAAGAAAAGAGATGAAGAGgtaattgtagtaatttaattctgattatcttcaaattttcagACTCTGAAAGGAAAAGCTGTTCACGACCTGAACAAAG -350 

A N P D E F L E S E L N S K I D R I H L G G V D E E N E E D A E L 35 

TCGCCAATC CCGACGAATTTCTGGAGTCGG AGTTGAATTCAAAAATCGATAGAATTCATTTGGGCGGAGTCGATGAAGAGAATGAGGAGGATGCTGAACT -340 

" SHP182 T 

E R S • 35 

CGAMGATCCtgattttctttttgtttttgaatttttattctattttgatccctgtttacttcttattgttctcattttgttgcgttgttttacatttta -330 



polyA 

ctcatttttgcataaacttgttgcaaaaa(caatataatttttgatctggaaatggttttaaaccttaacctttcatatattaataattttttttcaaaa -320 



aaacgttctaaaaaggttcctcattttttcaatataggaaattttgaaga -315 

-=F*=r - 14- Pi 
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SL2 

"\ M S E K T F H K 8 

tcttttccaaaaatgaggttcttcqcttgaaaagccaacatttaaaacctttttttttccagaaacctagtggttaATGTCTGAAAAGACGTTCCACAAG -3057 

A Q T I R A K A S G V P S I V E A V Q F H G V R I T K R D A L V K E 42 
GCACAGACCATCCGTGCAAAGGCATCCGGAGTGCCTTCAATCGTCGAAGCTGTACAGTTTCATGGAGTTCGCATCACAAAAAACGATGCTTTGGTTAAGG -2957 

V S E L Y R 48 

AGgtactacccaaatttcaaaatgttgcacaattcaattgaaaatataaattgtgaattaaattcaacttacatgttttttcagGTTTCCGMTTATACA -285 

S K N L D E L V H H S B L A A R H L Q E V G L M D N A V A I I D T 81 
GAAGTAAAMTCTAGATGMCTTGTTCATAACTCTCATCTGGCGGCTCGTCATCTTCAAGRAGTT GGATTAATGGATAATGCAGT TGCTCTRATTGATAC -275 

* SHP183 

SPSSNEGYVVNFLVREPKSFTAGVKAGVSTNGD 114 
ATCTCCAAGCTCAAATGAAGGATATGTTGTCAATTTCCTAGTTCGAGAACCAAAATCATTCACTGCTGGAGTCAAAGCAGGAGTTTCAACGAATGGAGAT -26 

A D V S L N A G R Q S V G G R G E A I N T Q Y T Y T V K 14 

GCGGATGTCAGTTTAMTGCCGGAAMCAMGTGTTGGA GGACGAGGAGAGGCAATCAAT ACACAGTATACATATACTGTAAAGgtaaggacgagagttg -255 

f SHP145 

gcactgccagtttggcatgttctcccaatattttttaattataaaatttggaagtataaaaaaatgtttgcttcatctaaaaatagcctttttcacatga -245 



aaaaaattgaaaaaaagtgctcaaaaatttcagaaatttccaatttccaaacaattttggagaactttcaaaaatttttccaactgaaattaaagctata -235 



SUBSTITUTE SHEET (RULE 26) 



WO 99/10482 



PCT/CA98/00803 



28/32 

^op-J continued.., G d h c f w 

ttctatcactaaattttatacaagtcttaagagaaaatgatgaagtggctcattttgtagaatttcctaaaaaataatatcttcagGGCGATCACTGCTT -225 



N I S A I K P F L G B Q K Y S N V S A T I Y R S L A H H P W H Q S 180 

C AACATTTCCGCMTCAAftCC ATTCCTGGGATGGCAAAMTATTCGMTGTATCAGCGRCTCTATACCGTTCACTTGCACATATGCCA TGGAATCAATCR -215 
SHpi * T SHP146 

DVDEHAAVLAKNGQLHNQRLLHQVKLNA 208 

GATGTTGATGAGAATGCAGCTGTTCTTGCATATAATGGACAACTATGGAATCAAAAGCTTTTGCATCAAGTCAAATTGAATGCGgtaaagtattataagt -205 



I W R T L R A T R D A A F S V R E Q A G H T L 23 

gttttgtccaaactatgatacagttcttcagATATGGAGAACACTTCGTGCCACTCGAGATGCCGCATTTTCAGTTCGTGAACAAGCCGGACACACTTTG -195 

K F S L E N A V A V D T R D R P I L A S R G I L A 25 

AAATTCTCGTTGGAGAATGCTGTAGCTGTTGATACAAGAGATAGACCTATTCTTGCAAGTCGTGGAATTCTTGgtaagagtaacaacgactatttttaaa -185 



aaatatctttttcgaaaaaattacgaacgaaaaaaaactgtattatgtacccaaacgcgaaattttgcagttcttgcgcgttcttgttgataaaaaatat -175 

R F A Q 26 

gtaaaaaattggaaaaactacgaaaagtcgataaaaattccgtaccaaccggaaaatgtttcattaatttctcttccttttttcagCTCGTTTTGCTCAA -165 

EYAGVFGDASFVKNTLDLQ 279 

GAGTACG CAGGAGTATTTGGTGATGCGT CATTTGTGAAGAATACATTAGATTTACAGgtaacaaccttatttcaacaattatttcaaattctattaaaaa -155 
SHP139 T 

A A A P L P L G F I L A A S F Q A K H L K G L G D R E V H I L 31 

taattccagGCAGCTGCCCCTCTTCCACTCGGTTTCATTCTTGCCGCCTCATTCCAAGCGAAACATTTGAAAGGACTCGGAGATCGAGMGTTCATATTT -145 

SHP140 T 
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DRCYLGGQQDVRGFGLNTIG 330 
TGGATAGATGTTATTTGGGT GGACAACAGGATGTTCGAGGATTTGGTCTGAATACTRTTGGAgtqagttttaacgaaattctcttgaaagtcaaataatc -1357 
T SHP184 

V R A D N S C L G G G A S L A G V V H L Y R P L I P P H H L F 361 
attttcagGTTAAAGCAGATAACAGTTGTCTTGGAGGAGGTGCTTCACTTGCTGGTGTCGTTCATTTGTATCGGCCATTGATTCCACCAAATATGCTATT -1257 

A H A F L A S G S V A S V H S K N L V Q Q L Q D T Q R V S A G F G 394 
TGCACACGCATTCCTTGCATCTGGAA GTGTTGCATCAGTTCATTCC AAAAATTTGGTGCAACAATTACAGGATACTCAACGAGTATCAGCCGGATTTGgt -1157 

SHP163 1 

gagtttgaaatttaggaaacatttqgatgaaatgtattttttaaaaatagatcagctttatttatttgaaaaaaaacgctcattaatcaatagtgatagt -1057 

tccattctgagtttcttcttcttcctcgcggaatacaatttttgacttgttcgcatccttcttgtgtactttgtcaccaatcttctcatcaactaaatct -957 

cgaaactgaaaaaatttcaaaattattccaaaaaatattgatgcagactacctttttgatggcttctggtacgtttctagcgtcgaatggattggctcct -857 

ccaataattaaagtctcgttcggtagtttagccagacggacggtgtgcttcaacatttttctaattaatctatttcaattcaagtcactcactctctctt -757 
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go/j-J continued,,, 

gacgtcttcttctatattccaagaactctqcagaaaatccgtgtccgccttgtgtqtttctagttggcgtcqgaggattcacgggtccaagacgaatgga -657 

tgtctaaaaaatgttatatttttgcataaagaaaacaccataccttcaccactttttgagttgtgggcgttctgaatggaattgatcgattattattgct -557 

ctttcttgatttgcttctatcagctgcgtaatgaggtgttctaaagatcagctttaattcatttggacaagtgctcctctaataaacttaccctgtactc -457 

atttttgaaacgatttacgatgataagattgaaagtggaagttaaatttagtctttcaaagttgaaataaaatcttcataaataaataaatttaaatgaa -357 

L A F V F K S 401 

agattaaataaattaacgttcacgtagttaaaaaaataatttaaatcttaaacttctaataaaaaatctcaattttccagGACTCGCATTCGTGTTCAAM -257 

I F R L E L N Y T Y P L K Y V L G D S L L G G F H I G A G V N F L 434 
GTATTTTCCGGCTGGAACTCAACTACACGTATCCATTGAAATATGTGCTCGGCGATTCATTGCTCGGTGGATTCCATATTGGAGCTGGTGTCAACTTCTT -157 

Gtagaga ttaattggatgcaagcacccct caaaaagatttttttgaaaaacgataaattcacagaatttcagttctttttctcccccttttattgttatt -57 
SHP134 ? 

ttcatcgtaa tgctgtgctagaagtcagag taaatatgagtttttttgtgttctaggaattccattttttcaggaagcaaatttaataaaaattatcgaa 44 
SHP164 T 

polyA 

r 

tttcttgctctaaagatgttgtacattttat ggaaatqttcgtatagtaa 94 

* SHP135 
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SL2 

-\ MSLRKINFVTG 11 
ttcgaacactttatatttctcgttttaaaactgtcggtgttttatagtaaactatcttcagaaaaaaATGAGCCTACGA AMATCAATTTCGTMCTGGA 194 

? SHP118 * 



NVKKLEEVKAILKNFE 27 
MCGTGAAGAAGCTTGAAGMGTCAAGGCTATTTTGAAGAATTTCGAGgtaaaatatatttgatattattcgaacgcgaaattttgcgccaaaagtacga 294 



tgcctggtctcaacacgacaatattttgttaaatacaaacgaatgtgcgccttcaaagaaaagtttcaatctttcgttgccgtggagatatttttagagt 394 



VSNVDVDLDEF 38 

ttttgtttaaattatatatttgtcgtatcgaaaccgggtaccgtaatcaatcaattaaatattttcagGTTTCAAACGTGGATGTCGATTT GGATGAATT 494 

SHP165 

Q G E P E F I A E R K C R E A V E A V R G P V L 62 

CCAAGGAGAACCCGAATTTATTGCCGAAAGAAAGTGCCGTGAGGCTGTTGAAGCTGTAAAAGGGCCCGTTTTGgtatggaaaattgtatttgttctaaaa 594 



VEDTSLCFNAMGGLPGPYIKWFLKNLKPE 91 
attgtcaaatttcagGTCGAAGACACAAGTTTATGCTTCAACGCAATGGGCGGTCTTCCTGGACCTT ATATCAAGTGGTTTTTG AAGMTTTGAAACCAG 694 

▼ 

SHP129 
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hap-1 continued.,, 32/32 

G L H N M L A GFSDKTAYAQCIF 111 

MKACTACATRATATGCTfiGgtaaatattttaattttttgaaaaaacttatttttcagCCGGATTTTCTGACAAAACCGCCTATGCTCAATGCATCTT^ 794 



AYTEGLGKPIHVFAG 126 
GCGTACACTGAAGGACTCGGAAAACCTATTCATGTAnTGCTGgtatgattttttgaatttaattctttaattttatatgttaatttagttgtttcattc 894 



K C P G Q I V A P R G D T A F G W D P 145 
ctcaatttatgagagatttttttttcaatttttctatttcagGAAAATGTCCTGGTCAMTTGTTGCTCCACGT GGTGATACTGCTTTTGGATGG GATCC 994 

T SHP130 



CFQPDGFKETFGEMDKDVKNEISHRAKALELLK 178 
ATGCTTCCAGCCAGATGGTTTTAAAGAAACATTCGGAGAAATGGATAAAGATGTAAAAAATGAAATTTCTCATCGTGCAAAGGCTCTGGAACTCCTCAAG 1094 



T SHP119 SHP120 * 



E \ F Q N N • 184 

GAATATTTTCAGAATAATtaaattattttttctcatctatgcaatttcttgaaaatttgttaagtttccgttgttatgcatttgcttttatttaaaaaaa 1194 



r 

aaagaatatttttacattaatattagatatgagaaaagagtaatttctggattttaaccttcctacaaaagaatatttatattttttgtatgatttttta 1294 

SHP93 f 

-h==r - IMF, 
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SEQUENCE LISTING 
(1) GENERAL INFORMATION : 
(i) APPLICANT: McGILL UNIVERSITY 

(ii) TITLE OF INVENTION: THE C, ELEGANS gro-1 GENE 

(iii) NUMBER OF SEQUENCES: 62 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SWABEY OGILVY RENAULT 

(B) STREET: 1981 McGill College Avenue - Suite 1600 

(C) CITY: Montreal 

(D) STATE: QC 

(E) COUNTRY: Canada 

(F) ZIP: H3A 2Y3 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: Windows 

(D) SOFTWARE: FastSEQ for Windows Version 2.0b 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: CA 2,210,251 

(B) FILING DATE: 25-AUG-1997 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Cote, France 

(B) REGISTRATION NUMBER: 4166 

(C) REFERENCE/ DOCKET NUMBER: 1770-179PCT FC/ld 

(ix) TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE: 514 845-7126 

(B) TELEFAX: 514 288-8389 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14458 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GCAAAATTTG CTAAGAT GAA GCGCCGGCTT GT T AC AT T G C TTTTCAGAGT CGATTGGTTC 60 
AAAATTGTCA ATTTTATCCA AAATAGAGTG CATTGTGTGT ACAATAACTA AAGAAT CAT C 120 
CATATCTGGT CCAACACAAC ATTGATGGAA TACTGGATCA ATTGTCTAAA AAAATATCAA 180 



WO 99/10482 



2/26 



PCT/CA98/00803 



TAGAATAATG AAACATTTTC AGAATT CATT ACCGTCAATG TCAGATAGTC ATTCCTTGAG 240 

TATTTTGTGG ATGCTTTGAA AATTCTTCGC TGGGCCATAT CTGTTGGATA AT CT GAAAAA 300 

CGCAATAAAT TTCATCGAAA ATGCCTATTA AATTGAATTA CCTTCTTCTT CAT CATTT C C 3 60 

TAACAATTCA TGCTCTTTTT GTGCTTGACT TGTGACCAAT TCTTTAAATT CAATTAAAT C 42 0 

GTCAATATCC TTTTGTACTA AAT C CAT CT T GAT ATT CAAT AT AT CT TT GT CAGTATAGTA 4 80 

TTCAGCGTAT CTGAAATTTC GAATTTATTT TTCTAATTCC CAAGAAAAAT AATTAATAAG 540 

AATACCTTAA CGAATTATTA T C CAAT AT AT CAT CATTT GC CACATCTGGA AGACGCTGAG 600 

GAACTGTTTG AGCAGCTTGG AGGTAGTCGT CATCGTCTCT GGAAATTGTT ATTTTCAATT 660 

TCAAAAAAAA AACTTTACTT ACGAAATATA CTCATTTGAT GCAATCCACG GAT CAAAACG 720 

ACGTCTTTGC ATCTTTGAAT CATTTTCCGC ATGGCACCGC ATCACTTCTT TCTTATGATT 7 80 

ATTTTCTAAC GTTTTTGAAA ATT CGACGT G CTCTTCACAA CGGCCGCCAT GTTTCGCAAG 84 0 

TTCTTCTTTT GATCGTATCT AAAATTTTAA ATTT GAAAAA AAGCTTACTA TCAAATTTTC 900 

GTATTTTTTC TCACCTGCTT ACACCGAACA AGCGTT CGAT AC GAAG CAT A ATTACATTGT 9 60 

C CAT ACT TAT TTTTGTCGTA TTCATTGGCA ACAAGACGGA ATCGTGTTCC AGGTGCAACT 102 0 

ATATATTGAG CAGGAGGACG AGTTGTTTGT TTCATGCTGC TTAAAAATAA AAATGGAAAA 1080 

TTGAGTCAAA AAGTTGAGAT AAAACAAATT AAAACAATTT T CT GAAAAAT AAACAACTGA 1140 

AATTT GAAGT AATAAACAAC ACGCGAAAAC GTTATTTCGG AGCATCGTTT GAGAAGTAAA 12 00 

ACTTTTTTTC GGCGCACCCT TGTGCGCAGT TTTTATCTTC TCTTTTAATT TAATTTTCAA 1260 

GCTAAATCTT TCTTTTTAAA CTTTGAATAA ATATTTAAAT ATTCAGAATG C AC CAAT AAA 132 0 

CCTGGAACAA AAT C GAT AAT GTTCCGCAAG CTTGGTTCTT CTGGGTCACT ATGGAAGCCG 13 8 0 

AAAAATCCGC ATTCTTTGGA ATACCTCAAA T AT TTACAAG GAGT GCTCAC AAAAAATGAG 144 0 

AAAGTTACGG AAAACAATAA GAAAAT AT T A GTAGAAGCAT TACGAGCTAT CGCAGAAATT 1500 

CTCATTTGGG GCGATCAGAA TGATGCTTCG GTTTTTGAGT GAGTTTTTTT CCAATGTTTT 15 60 

TTTTCAAATC TGATGTTGAA TTTCAGTTTC TTCCTTGAGC GGCAAATGCT TCTTTATTTC 1620 

T T GAAAAT T A TGGAACAAGG AAACACAC CA CTAAAT GTAC AATTACT GCA GACTTT GAAC 1680 

ATTTTATTCG AAAATATTCG ACATGAAACT TCACTTTGTA AGTTTTTTAT ATGGATTTTC 1740 

GCTTAAAATT GCCAGTTTTC AGATTTCCTT CTAAGTAACA AT CAT GTAAA CT C GATTATT 1800 

TCCCACAAAT T CGAT TTACA AAATGAT GAG ATCATGGCTT ACTACATTAG TTTTCTGAAA 1860 

ACTCTTTCAT TTAAACT GAA TCCAGCTACA ATCCACTTCT TCTTCAATGA AACGACTGAA 1920 

GAATTT CCAT TGTTGGTAGA AGTTTT GAAG CT TT AT AAT T GGAATGAATC AATGGTTCGA 1980 

ATTGCTGTTA GAAATATT CT TTTAAATATT GTGAGAGTTC AAGAT GAT T C AAT GATTATT 204 0 

TTCGCTATCA AGCATACAAA AGTTAGTAGA AAATTATTTT GAAAAGGTGT ATTTAAGCAA 2100 

TAAATATTAC AGGAATAT CT AT C GGAGTTA ATAGATTCTC TAGTTGGTCT CTCACTTGAA 2160 

AT GGACACAT T T GTAC GAT C T GCT GAGAAT GTGTTAGCTA AT C GAGAGAG AT T AC GAGGA 222 0 

AAAGTGGATG ATTTAATTGA TTTGATTCAT TATATTGGTG AACTATTGGA TGTGGAAGCT 22 8 0 

GTCGCCGAAA GTT TAT CAAT TTTAGGTCAG TTTTACTGCT GGAAAAT CAA GTTTTTAATG 234 0 

TTAAATTTTC AGTAACAACA C GATACTT AA GCCCTCTATT ACTTTCAAGT AT AT C AC CAA 24 00 

GAAGAGATAA TCATTCACTT CTACTCACTC CGATTTCTGC GTTATTTTTT TTCTCTGAAT 24 60 

TTTTATTGGT GAGTTTTAAC ATTTAAAATT ACATTTTTCT AATT T ATTTA TTTTTCAGAT 2520 

AGTTCGTCAC CAT G AAAC AA TATATACATT TTTATCATCT TTCCTATTTG AC AC T C AGAA 2580 

TACTTTGACG ACCCATTGGA TACGT CATAA T GAGAAAT AT TGCTTAGAAC CGAT TACAT T 2 640 

AT CAT CACCA AC CGGAGAAT ATGTGAATGA AGACCAGTAA GAGCT GAAAT TTTAAAATTT 2700 

TTGCTTTGAA T AT AGTAT TT TCAGCGTATT TTTCGATTTT CTACTGGAAG CAT T T GATT C 27 60 

CAGT CAAGCA GACGATT CGA AGGCATTCTA TGGATTAATG CT GATTTATT CAATGTTTCA 2820 

GAATAAT GGT GAGTTTTAAA AAATTGATTT GTTAAATTAA AATTT CCATT TCCAATAACT 2 880 

CCTCTTCAGA CAGTAAGTTT T CAAT GTT GT AAAGTTCCTG TTCATCTGTG ATCGTTTTCT 2940 

TCATTTTTTT AGTTTTGCAT GAACAGTTTT CAAATTTTTT T GAT AT CAT A CAGTAAATAT 3000 

CGTCATCCAG ATAATTTT CT ATTTAAAAAA AAT GAAT AAA AAGAGGGCGC GCAGAAATTG 3060 

C C GAAGT AAT GTAAATTTAA AGGGACACAT GCGTAGCTTG TTGTGTGGGT CTCGCCGCGC 3120 

TTTGTTTGAT TTATCTTGTT TTCTGCTCAA AGAGCTGTTT TTATTTTAGC GTT GAAT GCT 318 0 

TTTTTACCGT TCTCATCGGC TTTTTAATAG GAATATTTAA AAAAAAAGGT TTAATAAATC 324 0 

TTCGTTTTTA CAAAATCCAT CTAAGATTTG CATTT GT GAA GCTCAACAAG TAAAGTTTTA 3300 

AGTAACATTG TTTTTTAAAA AACAATT GAA CCAAATTTTG CCGAAACATT AAT AAC AT GA 33 60 

C GAT ACT CT A TAAAATATTC CTCTTTTCAA AATAAATTTT CAAAAAAAAT CCATTTTTCA 342 0 

GC CGAT GTT G GAGAACTT CT ATCTGCTGCC AACTTCCCAG T GCT CAAAGA AT CAAC GAC A 34 8 0 

ACTTCATTAG CT CAACAGAA TCTTGCTCGT CT C C GAAT AG CAT CT ACGT C TT CCAT AT CA 354 0 

AAGC GAACGA GAGCTATCAC TGAAATTGGA GTAGAAGCGA CCGAGGAAGA TGAGATTTTT 3600 

CAT GAT GTT C CTGAAGAACA AACGTTGGTA AGTAAATAAA T CAACATT GA TT GTT ACACA 3660 

AACTTTAATA TTTTTAAATT TGAAAATTTT CTTCAAAGTG CTCAAAAATC CTGTCGAAAA 372 0 
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TTACAGGAAG ATCTGGTGGA TGATGTATTG GTTGATACTG AAAATTCAGC AATAAGT GAT 378 0 

CCAGAAGTGA GTAGAAAACG T G CAT GTAT T AAT TAT T AAA AAAAAAATAT AGTTTTCCCC 384 0 

AGTTTTCCTT GACCTAAAAC TCAGCAATTT CAGCCTAAAA ACGTGGAGTC AGAATCTCGT 39 00 

TCTCGATTTC AATCTGCTGT T GAT GAGCTT CCACCTCCGT CGACTTCTGG ATGTGATGGT 39 60 

CGACTTTTTG ATGCACTTTC AT C GAT TAT C AAAGCAGTTG GAACAGATGA CAATCGAATT 4 02 0 

C GAC CAATTA CATTGGAACT TGCATGTCTT GTAATTCGGC AAATTTTAAT G ACT GT T GAT 4 080 

GAT GAAAAAG T AAGAT T AC A AATT CAAAAT TGAGCAAAAT CAGAAT CTAA AT TT CAT AAA 414 0 

TTGTTCAGGT AC AT AC CAGT TTAACGAAAT TATGCTTCGA AGTTCGTCTA AAACTTTTAT 42 00 

CATCAATTGG ACAAT AT GTT AAT GGAGAGA ATCTGTTTTT GGAGTGGTTT GAG GAT GAAT 42 60 

AT GCAGAATT T GAAGTAAGC CAAGAGGTCC GAAAAT AAT T T AAT T CAT C C TTTTTATTCA 4320 

GGTGAATCAC GTGAATTTCG AT AT AAT C GG TCACGAAATG CTTCTTCCTC CAGCTGCAAC 43 80 

TCCTCTTTCG AATCTGCTAC T T CAT AAGCG ATT G CC CAGT GGATTT GAAG AACGAATAAG 444 0 

AACTGTAGGA AACTTTTTAA ATT TGAAAAT T AAT TAT AT A TATATTTGCA GCAAAT CGTA 45 00 

TTCTACCTAC AT AT T C GAAA ATTGGAACGA GAT TT GAC C G GT GAAG GAGA CACAGAATTA 4560 

CCTGTGAGAG TGTTGAATTC T GAT CAGGAA CCAGTTGCCA TCGGTGATTG TATTAATTTA 4 62 0 

CGTGAGTTCA TCTGCATAGA AAAC AC CAT A TTTCTACTCA AAT T AACAAT TTTCAGATAA 4 68 0 

TTCGGATCTT CTATCCTGCA CTGTGGTTCC TCAACAACTA TGTTCTCTTG GAAAACCTGG 474 0 

TGATCGTCTT GCTCGATTCC TTGTCACTGA TAGACTTCAA TTAAT T CT T G TCGAACCGGA 4 8 00 

TTCTCGAAAA GCCGGATGGG CAATTGTTCG AT T C GT AG G A CTTCTTCAAG ATACAACAAT 4 8 60 

T AAT G GAG AT TCTACGGATT CGAAAGTTTT GCATGTTGTG GTGGAAGGGC AACCCTCGAG 4920 

AATTAAGGTA AGAATACTAA CGGGAAAAAA AAATCAAAAA ATTACTTCTG TT T CAGAAAA 4 98 0 

GACATCCGGT TTTAACTGCA AAGTT CATAT T C GAT GAT C A CATTCGGTGT AT GGCAGCAA 504 0 

AGCAACGGCT CACCAAGGTA ACGGAAAAAA TAACCAAAAA GACGGAAAGT TATTGTAAAT 5100 

GGACGAAATC GGC GAAATTA ATT GAAAAC G TTTGAATTTG CCGCTAAAAC CAAACGAAAA 5160 

C C AAAC GAAA GCGAAATTTA ACTATCCCTT CAGGTAGAAT ATACATTTTA TTTCTCTTTA 5220 

TAGGGTCGCC AAACAGCACG TGGTCTGAAA CTTCAGGCGA TAT GT T C AG C TCTTGGAGTT 52 80 

CCACGTATCG AT C CAGC GAC AATGACGTCA TCACCACGAA T GAAT C C ATT CAGAATT GT G 534 0 

AAAGGATGCG CACCGGGAAG T GT AC GAAAA ACTGTTTCCA CAT CAT CAT C GTCAAGCCAA 54 00 

GGACGTCCCG GACATTATTC TGCAAATCTT AGATCAGCAT CTAGAAAT G C AGGAATGATA 54 60 

C C AG AT GAT C CAACTCAACC GAGTAGT T CT TCGGAAAGAA GATCCTAGGG AT CAATAT CT 5520 

CTTCAGTTTC AT CATTTTAT GCTGTAAATT GTATTTAAGT ATTCCTATTC TTTGTAGTAC 558 0 

T GT AT TT ACA CAT C GT CT AG TTAAAAT CAC AAATCTCCGA AAAAACAAAC CAGT GAACAT 564 0 

GT GAT ATT T C TCTTGCCCAT AGTTCTCTTT TTTTTTTGAA ACAAAAACAA TTACTTTTAT 57 00 

GCTCACCTAT T C GAG C CAT A TTTTTTTCCC AATTACCGGT TGTTTATTTT AATTTCTTTT 5760 

TTTTTTCTGT AAATCTACTT TATTTTTAAA ACTGCATTTG AGATT GT GT A TATTTTTTCA 5820 

AAATGGTTCA AAT GCCGAAT CTATCTACTT TTTAATCATT AT T CAAAC AG AAAAAC C GAT 58 8 0 

T AT TT AT T C A GATTCTCAAA AATGGCTGAA AAAG CT GAAA AT CTT C CAT C TTCTTCGGCC 594 0 

GAAGCTT CAG AAGAG C CATC ACCTCAAACT GGACCAAATG TGAATCAAAA ACCATCGATT 6000 

TTGGTTCTTG GAATGGCTGG TTCTGGAAAA AC GACATT T G TT CAG GTAAC TT T CAT T CAA 6060 

TT TT GAGAGT T TT CAAACAT TACTATTTTC AGCGTCTCAC AGCATTCCTA CAT G CT C GT A 6120 

AAACACCTCC ATATGTGATT AATCTGGATC CGGCAGTTAG CAAAGTAC CT TAT C CAGT GA 6180 

AT GTT GACAT TCGAGATACT GT GAAATACA AGGAAGTTAT GAAAGAATTC GGAAT GGGAC 624 0 

CAAAT GGAGC AAT TAT GAC A TGTCTTAACC T GAT GT GT AC TCGTTTTGAT AAAGTAATTG 6300 

AGTT GATTAA TAAGAGATCT TCTGATTTCT CAGTTTGTCT TCTTGATACT CCTGGACAAA 6360 

TT GAAGCATT CACTTGGAGT GCTAGTGGAT CTATTAT CAC TGATTCATTG GCAAGTAGCC 6420 

ATCCCACGGT AAGGGATTTT GATTTAT GAA ATCTGCTTGA AATGAAAAAA GATTCTAATA 6480 

AATTTTTGAC T TT TAAACAT TTTTTACAGT TATATTTGGT CTATTTTCTA TCATTAAAAG 654 0 

CAAAAT GAAA AGTCGATTCT ACTCCATATT TATTAATTTC GACTTTT CAG GTGGTAATGT 6600 

ACATTGTGGA TTCCGCTCGT G C C AC AAAT C CAACTACATT CAT GT C CAAT ATGCTCTACG 6660 

CAT GT T C CAT TCTCTACCGT ACCAAACTTC CAT T CATT GT CGTTTTCAAC AAAG CT GAT A 672 0 

TTGT CAAAC C AACATTT GCA CTCAAATGGA TGCAAGATTT CGAAAGATTT GAT GAAGCTT 678 0 

TAGAGGATGC CAGAAGCAGT TAT AT GAAT G ATTTGAGTCG TTCATTGAGT CTCGTTCTTG 6840 

AT GAATT CT A TTGCGGACTG AAAACAGGTT TTTATTCGAA ATAAAAC CT T TTTTAAATAA 6900 

T AAAT TT CAG TTTGCGTCAG TTCTGCAACT GGAGAAGGAT TCGAAGATGT AAT GACAGCA 6960 

AT C GAT GAAA GTGTTGAAGC ATACAAAAAA GAAT AT GT T C CAAT GTAT GA AAAAGTGTTG 7 02 0 

GCTGAGAAAA AACTATTGGA TGAGGAGGAG AGAAAGAAAA GAGATGAAGA GGTAATTGTA 708 0 

GTAATTTAAT T CT GATTATC TTCAAATTTT CAGACT CT GA AAGGAAAAGC T GTT CAC GAC 7140 

CT GAACAAAG TCGCCAATCC CGACGAATTT CTGGAGTCGG AGTT GAATT C AAAAAT C GAT 72 00 

AGAATTCATT TGGGCGGAGT C GAT GAAGAG AAT GAG GAGG AT GCT GAACT CGAAAGATCC 7260 
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TGATTTTCTT TTTGTTTTTG AATTTTTATT CTATTTT GAT CCCTGTTTAC TTCTTATTGT 732 0 

TCTCATTTTG TTGCGTTGTT TTACATTTTA CTCATTTTTG CATAAACTTG TTGCAAAAAT 738 0 

CAATATAATT TTTGATCTGG AAATGGTTTT AAACCTTAAC CTTTCATATA TTAATAATTT 7440 

TTTTTCAAAA AAACGTTCTA AAAAGGTTCC TCATTTTTTC AATATAGGAA ATTTTGAAGA 7500 

TCTTTTCCAA AAAT GAGGTT CTTCGCTTGA AAAGCCAACA TTTAAAACCT TTTTTTTTCC 7560 

AGAAACCTAG TGGTTAATGT CTGAAAAGAC GTTCCACAAG GCACAGACCA TCCGTGCAAA 7 620 

GGCATCCGGA GTGCCTTCAA TCGTCGAAGC TGTACAGTTT CAT GGAGT T C G CAT C AC AAA 7 680 

AAACGATGCT TTGGTTAAGG AGGTACTACC CAAATTT CAA AATGTTGCAC AATTCAATTG 774 0 

AAAATATAAA TTGTGAATTA AATTCAACTT ACATGTTTTT TCAGGTTTCC GAATTATACA 7 80 0 

GAAGTAAAAA TCTAGATGAA CTTGTTCATA ACTCTCATCT GGCGGCTCGT CAT CTT CAAG 7 860 

AAGTTGGATT AAT G G AT AAT GCAGTTGCTC TAATTGATAC ATCTCCAAGC TCAAATGAAG 7 920 

GAT AT GT T GT CAATTTCCTA GTTCGAGAAC CAAAATCATT CACTGCTGGA GTCAAAGCAG 798 0 

GAGTTTCAAC GAAT GGAGAT GCGGATGTCA GTTTAAATGC CGGAAAACAA AGTGTTGGAG 8 04 0 

GAC GAG GAGA GGCAATCAAT ACACAGTATA CAT AT ACT GT AAAGGTAAGG AC GAGAGTT G 810 0 

GCACT GCCAG TTTGGCATGT TCTCCCAATA TTTTTTAATT ATAAAATTTG GAAGTATAAA 8160 

AAAATGTTTG CTT CAT CTAA AAAT AGC CTT TTTCACATGA AAAAAATTGA AAAAAAGTGC 8220 

TCAAAAATTT CAGAAATTT C CAATTTCCAA ACAATTTTGG AGAACTT T C A AAAATTTTTC 82 8 0 

CAACTGAAAT TAAAGCTATA TTCTATCACT AAAT T T TATA CAAGT CTTAA GAGAAAATGA 834 0 

TGAAGTGGCT CATTTTGTAG AATTTCCTAA AAAATAATAT CTTCAGGGCG AT CACT GCTT 8400 

CAACATTTCC GCAAT CAAAC CATTCCTGGG ATGGCAAAAA T ATT C GAAT G TATCAGCGAC 84 60 

TCTATACCGT TCACTTGCAC AT AT GC CAT G GAAT C AAT C A GATGTTGATG AGAATGCAGC 8520 

TGTTCTTGCA TATAAT GGAC AACTAT GGAA TCAAAAGCTT T T G CAT CAAG TCAAATTGAA 858 0 

TGCGGTAAAG TATTATAAGT GTTTTGTCCA AACTATGATA CAGTTCTTCA GAT AT GGAGA 8 64 0 

ACACTTCGTG C CACT C GAGA TGCCGCATTT TCAGTTCGTG AACAAGCCGG ACACACTTTG 8700 

AAATTCTCGT TGGAGAATGC TGTAGCTGTT GATACAAGAG ATAGACCTAT T CT T G CAAGT 87 60 

CGTGGAATTC TTGGTAAGAG T AACAAC GAC TATTTTTAAA AAATATCTTT TTCGAAAAAA 8 820 

TTACGAACGA AAAAAAACTG TATTAT GT AC CCAAACGCGA AATTTTGCAG TTCTTGCGCG 88 80 

TTCTTGTTGA TAAAAAATAT GTAAAAAATT GGAAAAACTA CGAAAAGTCG AT AAAAAT T C 894 0 

CGTACCAACC GGAAAAT GT T TCATTAATTT CTCTTCCTTT TTTCAGCTCG TTTTGCTCAA 90 00 

GAGTACGCAG GAGTATTTGG TGATGCGTCA TTTGTGAAGA ATACATTAGA TTTACAGGTA 9060 

ACAACCTTAT TTCAACAATT ATTTCAAATT CT AT TAAAAA TAATTCCAGG CAGCTGCCCC 912 0 

TCTTCCACTC GGTTTCATTC TTGCCGCCTC ATTCCAAGCG AAACATT T GA AAGGACTCGG 918 0 

AG AT C G AG AA GTTCATATTT TGGATAGATG TTATTTGGGT GGACAACAGG ATGTTCGAGG 924 0 

ATTTGGTCTG AATACTATTG GAGTGAGTTT TAACGAAATT CTCTTGAAAG T C AAAT AAT C 93 00 

ATTTTCAGGT TAAAGCAGAT AACAGTTGTC TTGGAGGAGG TGCTTCACTT GCTGGTGTCG 9360 

TTCATTTGTA TCGGCCATTG ATT C CAC CAA ATATGCTATT T GCACAC GC A TTCCTTGCAT 9420 

CTGGAAGTGT TGCATCAGTT CAT T C C AAAA ATTTGGTGCA AC AAT T AC AG GATACTCAAC 94 80 

GAGTAT CAGC CGGATTTGGT GAGTTTGAAA TTTAGGAAAC ATTTGGATGA AAT GT AT T T T 954 0 

TTAAAAATAG AT CAGCTTTA TTTATTTGAA AAAAAACGCT CATTAAT CAA TAGTGATAGT 9600 

TCCATTCTGA GTTTCTTCTT CTTCCTCGCG GAATACAATT TTTGACTTGT TCGCATCCTT 9660 

CTTGTGTACT TTGTCACCAA T CTT CT CATC AACTAAATCT CGAAACTGAA AAAATTT CAA 972 0 

AATTATTCCA AAAAAT ATT G AT GCAGACTA CCTTTTTGAT GGCTTCTGGT ACGTTTCTAG 97 8 0 

CGTCGAATGG ATTGGCTCCT CCAATAATTA AAGTCTCGTT CGGTAGTTTA GCCAGACGGA 984 0 

CGGTGTGCTT CAACATTTTT CT AATTAAT C T AT TT CAATT CAAGT CACT C ACTCTCTCTT 9900 

GACGTCTTCT TCTATATTCC AAGAACT CT G CAGAAAATCC GTGTCCGCCT TGTGTGTTTC 9960 

TAGTTGGCGT CGGAGGATTC ACGGGTCCAA GAC GAAT GGA T GT CTAAAAA AT GTTATATT 10020 

TTTGCATAAA GAAAACAC C A TACCTTCACC ACTTTTTGAG TTGTGGGCGT T CT GAAT G GA 10080 

ATTGATCGAT TATTATT GCT CTTTCTTGAT TTGCTTCTAT CAGCTGCGTA ATGAGGTGTT 1014 0 

CTAAAGATCA GCTTTAATTC AT TT GGACAA GTGCTCCTCT AAT AAACT T A CCCTGTACTC 102 00 

ATTTTTGAAA CGATTTAC GA T G AT AAG AT T GAAAGT GGAA GTTAAATTTA GTCTTTCAAA 102 60 

GTTGAAATAA AAT CTT CAT A AATAAATAAA TTTAAAT GAA AGATTAAATA AATTAACGTT 1032 0 

CACGTAGTTA AAAAAATAAT TT AAAT CTT A ACTTCTAATA AAAAATCTCA ATTTTCCAGG 10380 

ACTCGCATTC GTGTTCAAAA GTATTTTCCG GCTGGAACTC AACTACACGT AT C CAT T GAA 10440 

ATATGTGCTC GGCGATT CAT TGCTCGGTGG ATTCCATATT GGAGCTGGTG TCAACTTCTT 10500 

GT AGAGAT T A ATTGGATGCA AGCACCCCTC AAAAAG AT T T TTTTGAAAAA CGATAAATTC 10560 

ACAGAATTTC AGTTCTTTTT CTCCCCCTTT TAT T GT TAT T TT CAT CGTAA TGCTGTGCTA 10620 

GAAGT CAGAG T AAAT AT GAG TTTTTTTGTG TTCTAGGAAT TCCATTTTTT CAGGAAGCAA 10680 

AT TTAATAAA AATTATCGAA TTTCTTGCTC TAAAGATGTT GTACATTTTA TGGAAATGTT 10740 

CGTATAGTAA TTCGAACACT TTATATTTCT CGTTTTAAAA CTGTCGGTGT TTTATAGTAA 10800 
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ACTATCTTCA GAAAAAAATG AGCCTACGAA AAAT CAATTT CGTAACTGGA AACGT GAAGA 108 60 

AG CTT GAAGA AGTCAAGGCT AT TTT GAAGA ATTTCGAGGT AAAAT AT AT T T GAT AT TAT T 10 920 

CGAACGCGAA ATTTTGCGCC AAAAGTACGA TGCCTGGTCT CAACACGACA ATATTTTGTT 109 80 

AAATACAAAC GAATGTGCGC CTTCAAAGAA AAGTTTCAAT CTTTCGTTGC CGTGGAGATA 11040 

TTTTTAGAGT TTTTGTTTAA ATTATATATT TGTCGTATCG AAACCGGGTA CCGTAATCAA 11100 

TCAATTAAAT ATTTTCAGGT TTCAAACGTG GATGTCGATT TGGATGAATT CCAAGGAGAA 11160 

CCCGAATTTA TTGCCGAAAG AAAGTGCCGT GAGGCTGTTG AAGCTGTAAA AGGGCCCGTT 1122 0 

TTGGTATGGA AAATTGTATT TGTTCTAAAA AT T GT C AAAT TTCAGGTCGA AGACACAAGT 1128 0 

TTATGCTTCA ACGCAATGGG CGGTCTTCCT GGACCTTATA TCAAGTGGTT TTT GAAGAAT 1134 0 

TTGAAACCAG AAGGACTACA TAATATGCTA GGTAAATATT TTAATTTTTT GAAAAAACTT 114 0 0 

ATTTTTCAGC CGGATTTTCT GACAAAACCG CCTATGCTCA ATGCATCTTT GCGTACACTG 11460 

AAGGACTCGG AAAACCTATT CAT GT ATTT G CTGGTATGAT TTTTTGAATT TAATTCTTTA 1152 0 

AT T T TAT AT G TTAATT TAGT TGTTTCATTC CTCAATTTAT GAG AG AT TTT TTTTTCAATT 11580 

TTTCTATTTC AGGAAAAT GT CCTGGTCAAA TTGTTGCTCC ACGTGGTGAT ACTGCTTTTG 11640 

GATGGGATCC ATGCTTCCAG CCAGATGGTT TTAAAGAAAC AT T C GGAGAA AT GGATAAAG 117 00 

ATGTAAAAAA TGAAATTTCT CATCGTGCAA AGGCTCTGGA ACTCCTCAAG GAATATTTTC 117 60 

AGAATAATTA AATTATTTTT T CT CAT CTAT GCAATTTCTT GAAAATTTGT TAAGTTTCCG 11820 

TTGTTATGCA TTTGCTTTTA TTTAAAAAAA AAAGAATATT T TT ACAT TAA TATTAGATAT 11880 

GAGAAAAGAG TAATTTCTGG ATTTTAACCT TCCTACAAAA GAAT AT T TAT ATTTTTTGTA 1194 0 

TGATTTTTTA AAAAT AT C GT CAGGAAATAA TAACATTTCA GAT AT AC C C T GAACTCTACA 12 00 0 

GTT TAT GAT A TTCAGGAAAT TTCTGAATTT TCTGAAACCT T AC AAAAT G C GAACGGATCC 12 060 

GATTATTTTC GTGATTGGGT GCACTGGAAC CGGGAAAAGT GATCTTGGAG TGGCAATTGC 1212 0 

AAAGAAATAT GGAGGAGAGG T GAT TAGT GT AGATTCAATG CAATTTTATA AAGGT ACAT G 1218 0 

GGTTTTGTTT CAATTTTAAA TTAATTAATT TTCGTTTTTC AGGACTT GAC ATT G C C AC GA 1224 0 

ATAAGATAAC GGAAGAAGAA TCTGAAGGGA TTCAACATCA TAT GAT GT CA TTTTTGAATC 12 300 

CAT CT GAAT C AT CAT CTTAT AAT GTACATA GTT TCC GAGA AGTCACGTTG GATCTTATTA 12360 

AAGT GCTTAA TTCGCCACTT TTTGAACTTG ATCCTAATTT TCATAATTTT CAGAAAATCC 12420 

GCGCCCGTTC AAAAATTCCT GTAATTGTCG GAGGAACCAC TTATTATGCT GAAAGTGTCC 124 8 0 

TTTAT GAGAA TAATCTGATT GAAACCAACA CTT CAGAT GA CGTGGATTCC AAATCGAGAA 1254 0 

CATCATCAGA AT C GT CAT CT GAAGACACTG AAGAAGGAAT TAG T AAT C AA GAAT TAT GGG 12 600 

AT GAATT GAA AAAAAT CGAC GAAAAAT C AG CACTTCTTCT ACAT C C AAAT AAT C GTT AT C 12 660 

GAGTACAGAG AGCATTGCAA ATTTT CAGAG AAACTGGTAA TTGATTTGCA AATTTCCAGA 12720 

TTAAAAACAA AT CAAGTAAA GTTTTTTGCA GGAATC C GAA AAAGTGAACT TGTTGAAAAA 127 80 

CAGAAATCAG AT GAAACT GT TGATTTGGGT GGACGACTAC GATTT GATAA TTCTTTAGTT 1284 0 

ATTTTTATGG ATGCAACACC TGAAGTTTTA GAAGAAAGAC TTGATGGAAG AGT T GAT AAA 12900 

AT GAT T AAAT T GGGTTT GAA GAAT GAAT T G ATCGAGTTTT ATAAC GAGGT AAAT AT TT GA 12 960 

ATTTT TCCAG AAAAAAAAAG AAAATTTTTT ATTATTTTGT TTTTTTTTCA TTCTTTACTA 13 020 

TTTTCCAAAA AAGT TTAAAC TTTTGAAAAC TGTTCAGAAA ATGTTCGTGT ATTTATTTTA 13 08 0 

GCTTACTGAG GCATTATTTC ATTGTGATTT TTACTATACT CTATAAACTA AATTTTCAGC 1314 0 

ACGCCGAGTA CATAAATCAC AG C AAAT AT G GT GT CAT GCA ATGTATTGGT CTTAAAGAAT 132 00 

TCGTTCCATG GCT CAATTT G GACCCATCAG AAAGAGATAC ACTCAATGGG GAT AAATT GT 132 60 

TCAAGCAAGG GTAATTTAAA TTTATTTTCA ATTTTTATAA ATTCCAAGCT ATTTTCAGAT 13320 

GC GAT GAT GT GAAGCTTCAC AC T C GACAAT ATGCACGGCG CCAGAGACGG TGGTATCGAT 13380 

CGAGACTTTT AAAACGGTCG GATGGTGATC GGGTATGTTG ATTTTAAAAA AATTGAATTT 1344 0 

TTAAAGAACT TTTTTACTAA ATTAACAAAG TTATTGGCTG AAAAT GGCTG AAAATTATAG 13500 

TAAAACTAAT CAAAAAAATT GAAATTTT GA ATTAAAGT CA TAAAGTGACG AC C AGAAAAT 13560 

TAAAAAAAAA CATTTTTCTA TTTTAATTAA TTCACTCTAC TTCACTTTAA AAATAATTTT 13620 

CAGAAAATGG CAAGTACAAA AATGCTGGAT ACATCTGACA AGTACCGAAT AAT TAGT GAT 1368 0 

GGAAT GGACA TT GT T GAT CA AT GGAT GAAT GGAATCGATC TATTTGAAGA TGTAAAATTT 13740 

CACAAATT CT AAAATTT CCG AAT C AC AAAT TAAAATTTCT ACAGATCTCC ACAGACACCA 13 8 00 

AT C CAATT CT AAAAGGGTCC GAT GCAAATA TTCTGCTGAA TTGTGAAATC T GT AAT ATTT 138 60 

CAAT GACT GG AAAAGATAAT TGGTTTGTTT CAATACATAT TATAATTTCG AAAT GAATTT 13920 

TTT CAGGCAG AAACATATCG AT GGGAAAAA GCACAAGCAT CAT GCTAAGC AAAAGAAATT 13 98 0 

GGCAGAGACT CGCACATAAG ACGCTATATT TATTTTTTGT TAACTTAAAT TATTTTTGTT 14 04 0 

GTTGATTGTT CT CTAAATAA AAAAACAGCT CAGAGAGAAG ATTAGGCGCT CGTCCACATC 14100 

TCC GAC GAT A GTCAACCCGA ACGAAGGGAA CTATCTTTAA T T GT C AGT GA T GAC GT CAT G 14160 

T C GT CAAGAA CT CGT CAT AG CTGTGAGAAT T GAAC CAT T A TAGATTT GGA CATTAGTTTA 14220 

GGTTATATCC AGTACACTAA AT GGT ACAT G AT AGAC AGT G TACATTTACA GATTTATAGA 142 8 0 

TTGTCTCAGT GACTAGTTAC CGGAAGAGGA GAGGAGAACA TGTGGCGATG TCTTTTGGAT 1434 0 



WO 99/10482 



6/26 



PCT/CA98/00803 



CGATATTATT CCGTCTGAAA ATTGTTCACT AGGGGGACTG CC GAT T AC CA CTTCACATGA 144 00 
CGGAACAT GT TAGTTAAAAT ATTGGCTTTT AT ACACATT T TCAAAATAGC ACCTGTAT 14 45 8 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 430 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



Met 


lie 


Phe 


Arg 


Lys 


Phe 


Leu 


Asn 


Phe 


Leu 


Lys 


Pro 


Tyr 


Lys 


Met 


Arg 


1 








5 










10 










15 




Thr 


Asp 


Pro 


He 


He 


Phe 


Val 


He 


Gly 


Cys 


Thr 


Gly 


Thr 


Gly 


Lys 


Ser 








20 










25 










O (J 






Asp 


Leu 


biy 


vai 


Ala 


lie 


/Ua 


Lys 


Lys 


Tyr 


bi. y 


bi y 


kul U 


v ai 


Tin 

lie 


Ser 






O EL 










4 U 










4j 








Val 


Asp 


Ser 


Met 


Gin 


Jrne 


Tyr 


Lys 


b±y 


Leu 


Asp 


lie 




Thr 


Asn 


Lys 




50 










bo 










bu 










lie 


Thr 


(jjIU 




blU 


Ser 


bill 


fl tr 

bxy 


lie 




XIX o 


XIX o 


M<=* f- 
lYit; l. 




C2 <~> y 


XT Ilfr 


DO 










n n 
/ u 




















ft n 


Leu. 


Asn 


Pro 






O tr -L 


OCX 




yr 


7\ C! T"l 


Val 


Hi s 


Ser 


Phe 


Arg 


Glu 






























95 




V dX 


"h r~ 


Leu 


As p 


Leu 


Ile 


Lys 


Lys 


He 


Arg 


Ala 


Arg 


Ser 


Lys 


He 


Pro 








inn 
x \j yj 










1 0 S 

X \J «J 










110 






Val 


lie 


Val 


Glv 


Glv 


Thr 


Thr 


Tvr 


Tvr 
_/ 


Ala 


Glu 


Ser 


Val 


Leu 


Tvr 


Glu 






115 










120 










125 








Asn 


Asn 


Leu 


He 


Glu 


Thr 


Asn 


Thr 


Ser 


Asp 


Asp 


Val 


Asp 


Ser 


Lys 


Ser 




130 










135 










140 










Arg 


Thr 


Ser 


Ser 


Glu 


Ser 


Ser 


Ser 


Glu 


Asp 


Thr 


Glu 


Glu 


Gly 


He 


Ser 


145 










150 










155 










160 


Asn 


Gin 


Glu 


Leu 


Trp 


Asp 


Glu 


Leu 


Lys 


Lys 


He 


Asp 


Glu 


Lys 


Ser 


Ala 










165 










170 










175 




Leu 


Leu 


Leu 


His 


Pro 


Asn 


Asn 


Arg 


Tyr 


Arg 


Val 


Gin 


Arg 


Ala 


Leu 


Gin 








180 










185 










190 






lie 


Phe 


Arg 


Glu 


Thr 


Gly 


He 


Arg 


Lys 


Ser 


Glu 


Leu 


Val 


Glu 


Lys 


Gin 






195 










200 










205 








Lys 


Ser 


Asp 


Glu 


Thr 


Val 


Asp 


Leu 


Gly 


Gly 


Arg 


Leu 


Arg 


Phe 


Asp 


Asn 




210 










215 










220 










Ser 


Leu 


Val 


He 


Phe 


Met 


Asp 


Ala 


Thr 


Pro 


Glu 


Val 


Leu 


Glu 


Glu 


Arg 


225 










230 










235 










240 


Leu 


Asp 


Gly 


Arg 


Val 


Asp 


Lys 


Met 


He 


Lys 


Leu 


Gly 


Leu 


Lys 


Asn 


Glu 










245 










250 










255 




Leu 


He 


Glu 


Phe 


Tyr 


Asn 


Glu 


His 


Ala 


Glu 


Tyr 


He 


Asn 


His 


Ser 


Lys 








260 










265 










270 






Tyr 


Gly 


Val 


Met 


Gin 


Cys 


He 


Gly 


Leu 


Lys 


Glu 


Phe 


Val 


Pro 


Trp 


Leu 






275 










280 










285 








Asn 


Leu 


Asp 


Pro 


Ser 


Glu 


Arg 


Asp 


Thr 


Leu 


Asn 


Gly 


Asp 


Lys 


Leu 


Phe 




290 










295 










300 










Lys 


Gin 


Gly 


Cys 


Asp 


Asp 


Val 


Lys 


Leu 


His 


Thr 


Arg 


Gin 


Tyr 


Ala 


Arg 


305 










310 










315 










320 


Arg 


Gin 


Arg 


Arg 


Trp 


Tyr 


Arg 


Ser 


Arg 


Leu 


Leu 


Lys 


Arg 


Ser 


Asp 


Gly 



325 330 335 
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Asp 


Arg 


Lys 


Met 
340 


Ala 


Ser 


Thr 


Lys 


Met 
345 


Leu 


Asp 


Thr 


Ser 


Asp 
350 


Lys 


Tyr 


Arg 


He 


He 
355 


Ser 


Asp 


Gly 


Met 


Asp 
360 


He 


Val 


Asp 


Gin 


Trp 
365 


Met 


Asn 


Gly 


He 


Asp 
370 


Leu 


Phe 


Glu 


Asp 


He 
375 


Ser 


Thr 


Asp 


Thr 


Asn 
380 


Pro 


He 


Leu 


Lys 


Gly 


Ser 


Asp 


Ala 


Asn 


He 


Leu 


Leu 


Asn 


Cys 


Glu 


He 


Cys 


Asn 


He 


Ser 


385 










390 










395 










400 


Met 


Thr 


Gly 


Lys 


Asp 
405 


Asn 


Trp 


Gin 


Lys 


His 
410 


He 


Asp 


Gly 


Lys 


Lys 
415 


His 


Lys 


His 


His 


Ala 
420 


Lys 


Gin 


Lys 


Lys 


Leu 
425 


Ala 


Glu 


Thr 


Arg 


Thr 
430 







(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2041 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CTGCCATAAG ATGGCGTCCG TGGCGGCTGC AC GAGCAGT T CCTGTGGGCA GTGGGCTCAG 60 

GGGCCTGCAA CGGACCCTAC CTCTTGTAGT GATTCTCGGG GCCACGGGCA CCGGCAAATC 12 0 

CACGCTGGCG TTGCAGCTAG GCCAGCGGCT CGGCGGTGAG ATCGTCAGCG CT GACT C CAT 18 0 

GCAGGTCTAT GAAGG CCTAG ACATCATCAC CAACAAG GTT TCTGCCCAAG AG CAGAGAAT 2 40 

CTGCCGGCAC CAC AT GAT C A GCTTTGTGGA TCCTCTTGTG ACCAATTACA CAGTGGTGGA 300 

CTTCAGAAAT AGAGCAACTG CT CT GAT T GA AGATATATTT GCCCGAGACA AAATTCCTAT 3 60 

TGTTGTGGGA GGAAC CAATT ATTAC AT T GA ATCTCTGCTC TGGAAAGTTC TTGTCAATAC 42 0 

CAAGCCCCAG GAGAT GGGCA CT GAGAAAGT GATTGACCGA AAAGT GGAGC TTGAAAAGGA 480 

GGATGGTCTT GTACTT CACA AACGCCTAAG CCAGGTGGAC CCAGAAATGG CTGCCAAGCT 54 0 

GCATCCACAT GACAAACGCA AAGTGGCCAG GAGCTTGCAA GTTTTTGAAG AAACAGGAAT 600 

CTCTCATAGT GAATTTCTCC ATCGTCAACA T AC GGAAGAA GGTGGTGGTC CCCTTGGAGG 660 

TCCTCTGAAG TTCTCTAACC CTTGCATCCT TTGGCTTCAT GCTGACCAGG CAGTT CTAGA 720 

TGAGCGCTTG GATAAGAGGG TGGATGACAT GCTTGCTGCT GGGCTCTTGG AGGAACTAAG 780 

AGATTTTCAC AGAC GCTATA ATCAGAAGAA TGTTTCGGAA AATAGCCAGG ACTATCAACA 84 0 

TGGTATCTTC CAAT CAATT G GCTTCAAGGA ATTTCACGAG T AC CT GAT CA CT GAGGGAAA 900 

AT GCACACT G GAGACTAGTA ACCAGCTTCT AAAGAAAGGA CCTGGTCCCA TTGTCCCCCC 960 

TGTCTATGGC TTAGAGGTAT CTGATGTCTC GAAGTGGGAG GAGTCTGTTC TTGAACCTGC 102 0 

TCTTGAAATC GTGCAAAGTT T CATC CAGGG CCACAAGCCT ACAGCCACTC CAATAAAGAT 108 0 

GCCATACAAT GAAGCT GAGA ACAAGAGAAG TTATCACCTG TGTGACCTCT GTGATCGAAT 1140 

CAT CATT GGG GAT CGCGAAT GGGCAGCGCA CATAAAATCC AAATCCCACT TGAACCAACT 1200 

GAAGAAAAGA AGAAGATTGG ACT CAGAT GC TGTCAACACC ATAGAAAGTC AGAGTGTTTC 1260 

CCCAGACTAT AACAAAGAAC CTAAAGGGAA GGGATCCCCA GGGCAGAATG AT CAAGAGCT 1320 

GAAAT GCAGC GTTTAAGAGA CATGTCCAGT GGCCTTTGGA AAGGTGGTGG GGAT CCAGTT 1380 

CAGGAGGGAG GGGTATGTTT GTCTCCCAGT CT GGGCAAAG GAGTGCTATG CGGAATTCTC 1440 

TGCATAGCAG AAAAGCTCCC ACCATTTTCT TTTGATGTGG TTTTAAAGTC TCACGTTCTC 1500 

TATAATAGAA ACAGCAGGTC TTGTCAGCTC CTTGTGTGGC T GAT GT GT CT G GAAAT GAT G 1560 

TAGTTCAGGA AAGCATTTTT TTTTTCTTTG AACCTTAAAG GT T CTATT AT TAAAAGCAGC 162 0 

ACAGATTCCA CATTTTTATA CAT GAG GAT C TTCTTTGTGG TGAATACCAG GATTGACTGC 168 0 

ATCCCTTTAA AAGAAGTTTT ATGTCCCTGA CTCTGGCTAA AATTAT CTAA TT T C CAGAT G 174 0 

CTTTTGTAGA T GACT GAAGT ATTTGTGAGC CACATATTGG GAGTT CTAGA TTTGAGTGAA 180 0 

TGGCAGGAAA GGGCCATCTC CATT GAGAT G ATTAAGT GAA CCAAACTAGT TCTCGGAATT 18 60 

CTACAGAGAA GGAGGGAATC AGACTGAGGA AGCTGTGACA TAGGACTTGA AGACCAAAGA 192 0 

CTTTGAAATT TGCGAGCTGC TCATGTGTGA GT TAT TAT CA CTGCTGTCTT T CTATT GAGT 1980 
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TACAAATCTA TATTTTTATT GAAGTTTAAA TAAAGAAAAA ATT TACAAGA AAAAAAAAAA 2 04 0 
A 2041 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 92 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met 


Phe 


Arg 


Lys 


Leu 


yjj- y 


Ser 


Ser 


L^r _1_ _y 


Del 


Leu 


i rp 


Lys 


Pro 


Lys 


As n 


1 








c 










± u 














Pro 


His 


Ser 


Leu 


^_LU 


Tyr 


Leu 


Lys 


Tyr 


Leu 


ul 11 


ol y 


V d_L 


Leu 


1 11 X 


Lys 








<c U 










£ o 










o u 






Asn 


IitJ-U 


Lys 


V o._L 


± 11 i 


Lj_L U. 


rto 11 


As n 


Lys 


j_jy s 


lie 


Jjeu 


V ci_l_ 


w_L LL 




Leu 
















a n 










A R 








Arg 


a 


11c 




OJ_ Li 


J — L tr 


Leu 


He 


1 L P 


Gly 


Asp 


Gin 


Asn 


As p 


Ala 


Ser 


D u 




















fin 










vax 




Asp 


Jr X it: 


r lit; 


Leu 


Glu 


Arg 


Gin 


Met 


Leu 


Leu 


T\7T 

o. y j- 


Phe 


Leu 


Lys 


DO 










70 










75 










80 


Tip 




Li 


Gin 


Gly 


Asn 


Thr 


Pro 


Leu 


Asn 


Val 


Gin 


Leu 


Leu 


Gin 


Thr 










85 










90 










95 




Leu 


As n 


He 


Leu 


Phe 


Glu 


Asn 


He 


Arg 


His 


Glu 


Thr 


Ser 


Leu 


T vr 


Phe 








100 










105 










110 






Leu 


Leu 


Ser 


Asn 


Asn 


His 


Val 


Asn 


Ser 


He 


He 


Ser 


His 


Lvs 


Phe 


Asp 






115 










120 










125 








Leu 


Gin 


Asn 


Asp 


Glu 


He 


Met 


Ala 


Tyr 


Tyr 


He 


Ser 


Phe 


Leu 


Lys 


Thr 




130 










135 










140 










Leu 


Ser 


Phe 


Lys 


Leu 


Asn 


Pro 


Ala 


Thr 


He 


His 


Phe 


Phe 


Phe 


Asn 


Glu 


145 










150 










155 










160 


Thr 


Thr 


Glu 


Glu 


Phe 


Pro 


Leu 


Leu 


Val 


Glu 


Val 


Leu 


Lys 


Leu 


Tyr 


Asn 










165 










170 










175 




Trp 


Asn 


Glu 


Ser 


Met 


Val 


Arg 


He 


Ala 


Val 


Arg 


Asn 


He 


Leu 


Leu 


Asn 






180 










185 










190 






He 


Val 


Arg 


Val 


Gin 


Asp 


Asp 


Ser 


Met 


He 


He 


Phe 


Ala 


He 


Lys 


His 






195 










200 










205 








Thr 


Lys 


Glu 


Tyr 


Leu 


Ser 


Glu 


Leu 


He 


Asp 


Ser 


Leu 


Val 


Gly 


Leu 


Ser 




210 










215 










220 










Leu 


Glu 


Met 


Asp 


Thr 


Phe 


Val 


Arg 


Ser 


Ala 


Glu 


Asn 


Val 


Leu 


Ala 


Asn 


225 










230 










235 










240 


Arg 


Glu 


Arg 


Leu 


Arg 


Gly 


Lys 


Val 


Asp 


Asp 


Leu 


He 


Asp 


Leu 


He 


His 










245 










250 










255 




Tyr 


He 


Gly 


Glu 


Leu 


Leu 


Asp 


Val 


Glu 


Ala 


Val 


Ala 


Glu 


Ser 


Leu 


Ser 






260 










265 










270 






He 


Leu 


Val 


Thr 


Thr 


Arg 


Tyr 


Leu 


Ser 


Pro 


Leu 


Leu 


Leu 


Ser 


Ser 


He 






275 










280 










285 








Ser 


Pro 


Arg 


Arg 


Asp 


Asn 


His 


Ser 


Leu 


Leu 


Leu 


Thr 


Pro 


He 


Ser 


Ala 




290 










295 










300 










Leu 


Phe 


Phe 


Phe 


Ser 


Glu 


Phe 


Leu 


Leu 


He 


Val 


Arg 


His 


His 


Glu 


Thr 


305 










310 










315 










320 


He 


Tyr 


Thr 


Phe 


Leu 


Ser 


Ser 


Phe 


Leu 


Phe 


Asp 


Thr 


Gin 


Asn 


Thr 


Leu 








325 










330 










335 
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Thr 


Thr 


His 


Trp 


He 


Arg 


His 


Asn 








o4 U 










Thr 


Leu 


Ser 


Ser 


Pro 


Thr 


Gly 


Glu 






355 










360 


Phe 


Asp 


Phe 


Leu 


Leu 


Glu 


Ala 


Phe 




370 










375 




Lys 


Ala 


Phe 


Tyr 


Gly 


Leu 


Met 


Leu 


O O R 

Job 










oy U 






Ala 


Asp 


Val 


Gly 


Glu 


Leu 


Leu 


Ser 










4 05 








Glu 


Ser 


Thr 


Thr 


Thr 


Ser 


Leu 


Ala 








42 0 










lie 


Ala 


Ser 


rpv 

i nr 


Ser 


Ser 


lie 


Ser 






4 o D 










a a n 


lie 




V d _L 


ul Li 


Al ^ 


J. ilX. 


ri~\ t 


(^1 11 




4jU 










/I cc 

4 jj 




Glu 


Glu 


Gin 


Thr 


Leu 


Glu 


Asp 


Leu 


4 65 










A 1 A 
4 / U 






GlU 


Asn 


Ser 


a 


lie 


Ser 


Asp 


Pro 










4 O O 








Ser 


Arg 


Ser 


Arg 


ir ne 


bin 


Ser 


j\-L a 








ri n 
o u u 










Thr 


Ser 


biiy 


Cys 


Asp 




Arg 


Leu 






olo 










con 
DZ U 


Lys 


Ala 


vai 


t^iy 


rp V, 

i nr 


Asp 


Asp 


Asn 














jjD 




Leu 


.Mia 


Cys 


Leu 


val 


lie 


Arg 


LjIII 


^A ^ 
D4 O 










J o u 






Lys 


Val 


His 


Thr 


Ser 




Thr 


Lys 










JU J 








Leu 


Leu 


Ser 


Ser 


lie 


Gly 


Gin 


Tvr 








JO u 












i rp 


JTlltr 


LA 




w_L Li 


1 Y I - 








R Q R 

J J J 










600 


Phe 


Asp 


lie 


lie 


Gly 


His 


Glu 


Met 




61 0 

_L W 










615 




Leu 


Ser 


Asn 


Leu 


Leu 


Leu 


His 


Lys 


625 










630 








Tip 
lie 


r\±. y 






Tl ^ 

J — Lc 


Val 


Phe 










645 








Arg 


Asp 


Leu 


Thr 


Gly 


Glu 


Gly 


Asp 








660 










As n 


Ser 


Asp 


Gin 


Glu 


Pro 


Val 


Ala 






675 










680 


Asn 


Ser 


Asp 


Leu 


Leu 


Ser 


Cvs 


Thr 




690 










695 




Leu 


Gly 


Lys 


Pro 


Gly 


Asp 


Arg 


Leu 












71 0 






Leu 


Gin 


Leu 


He 


Leu 


Val 


Glu 


Pro 










725 








lie 


Val 


Arg 


Phe 


Val 


Gly 


Leu 


Leu 








740 










Ser 


Thr 


Asp 


Ser 


Lys 


Val 


Leu 


His 






755 










760 


Arg 


lie 


Lys 


Lys 


Arg 


His 


Pro 


Val 




770 










775 




Asp 


His 


lie 


Arg 


Cys 


Met 


Ala 


Ala 


785 










790 







GlU 


Lys 


Tyr 


Cys 


Leu 


bilU 


Pro 


lie 


345 










0 c n 
OO U 






Tyr 


Val 


Asn 


GlU 


Asp 


His 


Val 


rne 








365 








Asp 


Ser 


Ser 


Gin 


Ala 


Asp 


Asp 


Ser 








o o n 










lie 


Tyr 


Ser 


Met 


Jrne 


pi « 
biin 


As n 


Asn 






one 

o y o 










4 00 


Ala 


Ala 


Asn 


Phe 


Pro 


Val 


Leu 


Lys 




410 










A "1 EL 

41o 




Gin 


Gin 


Asn 


Leu 


Ala 


Arg 


Leu 


Arg 


425 










4 0 U 






Lys 


Arg 


Thr 


Arg 


Ala 


He 


Thr 


Glu 










4 4 O 








Asp 


bill 


T "I « 

lie 


r ne 


ni 0 


Asp 


vai 


Pro 








4 bU 










val 


Asp 


Asp 


vai 


Leu 


vai 


Asp 


Thr 






A r 7 U. 










/ion 
4 0 U 


blU 


Pro 


Lys 


Asn 


vai 


blU 


JCl 


bilU 














4 y 0 




val 


Asp 


fin 
CalU 


Leu 


Pro 


Pro 


Pro 


Ser 


c n 

DUO 










JlU 






Phe 


Asp 


Ala 


Leu 


Ser 


oer 


lie 


lie 










cot; 

DAD 








Arg 


lie 


Arg 


Pro 


lie 


1 nr 


Leu 


CjIU 








a n 
04 u 










Tl Q 


Leu 




J. 1 1 X. 


Val 


A.sp 


Asp 








R R R 

*J vJ 










5 60 


Leu 


CyS 


Phe 


Glu 


Val 


Arg 


Leu 


Lys 




c i7 0 










575 




vai 


Asn 


oxy 


w_L LI 




Leu 


IT 1 1 tr 


T 

Leu 












590 






Glu 


Phe 


Glu 


Val 


Asn 


His 


Val 


Asn 










605 








Leu 


Leu 


P ro 


P ro 


Ala 


Ala 


Thr 


Pro 








62 0 










Arg 


Leu 


Pro 


Ser 


Glv 


Phe 


Glu 


Glu 






635 










640 


Tvr 


Leu 


His 


lie 


Arg 


Lys 


Leu 


Glu 




650 










655 




Thr 


Glu 


Leu 


Pro 


Val 


Arg 


Val 


Leu 


665 










670 






He 


Gly 


Asp 


Cys 


lie 


Asn 


Leu 


His 










685 








Val 


Val 


Pro 


Gin 


Gin 


Leu 


Cys 


Ser 








700 










Ala 


Arg 


Phe 


Leu 


Val 


Thr 


Asp 


Arg 






71 R 










720 


Asp 


Ser 


Arg 


Lys 


Ala 


Gly 


Trp 


Ala 




730 










735 




Gin 


Asp 


Thr 


Thr 


He 


Asn 


Gly 


Asp 


745 










750 






Val 


Val 


Val 


Glu 


Gly 


Gin 


Pro 


Ser 










765 








Leu 


Thr 


Ala 


Lys 


Phe 


He 


Phe 


Asp 








780 










Lys 


Gin 


Arg 


Leu 


Thr 


Lys 


Gly 


Arg 






795 










800 
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Gin 


Thr 


Ala 


Arg 


Gly 


Leu 


Lys 


Leu 


Gin 


Ala 


lie 


Cys 


Ser 


Ala 


Leu 


Gly 










805 










810 










815 




Val 


Pro 


Arg 


lie 


Asp 


Pro 


Ala 


Thr 


Met 


Thr 


Ser 


Ser 


Pro 


Arg 


Met 


Asn 








820 










825 










830 






Pro 


Phe 


Arg 


lie 


Val 


Lys 


Gly 


Cys 


Ala 


Pro 


Gly 


Ser 


Val 


Arg 


Lys 


Thr 






835 










840 










845 








Val 


Ser 


Thr 


Ser 


Ser 


Ser 


Ser 


Ser 


Gin 


Gly 


Arg 


Pro 


Gly 


His 


Tyr 


Ser 




850 










855 










860 










Ala 


Asn 


Leu 


Arg 


Ser 


Ala 


Ser 


Arg 


Asn 


Ala 


Gly 


Met 


lie 


Pro 


Asp 


Asp 


865 










870 










875 










880 


Pro 


Thr 


Gin 


Pro 


Ser 


Ser 


Ser 


Ser 


Glu 


Arg 


Arg 


Ser 











885 890 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


Ala 


Glu 


Lys 


Ala 


Glu 


Asn 


Leu 


Pro 


Ser 


Ser 


Ser 


Ala 


Glu 


Ala 


Ser 


1 






5 










10 










15 




Glu 


Glu 


Pro 


Ser 


Pro 


Gin 


Thr 


Gly 


Pro 


Asn 


Val 


Asn 


Gin 


Lys 


Pro 


Ser 








20 










25 










30 






He 


Leu 


Val 


Leu 


Gly 


Met 


Ala 


Gly 


Ser 


Gly 


Lys 


Thr 


Thr 


Phe 


Val 


Gin 






35 










40 










45 








Arg 


Leu 


Thr 


Ala 


Phe 


Leu 


His 


Ala 


Arg 


Lys 


Thr 


Pro 


Pro 


Tyr 


Val 


He 


50 










55 










60 










Asn 


Leu 


Asp 


Pro 


Ala 


Val 


Ser 


Lys 


Val 


Pro 


Tyr 


Pro 


Val 


Asn 


Val 


Asp 


65 










70 










75 










80 


He 


Arg 


Asp 


Thr 


Val 


Lys 


Tyr 


Lys 


Glu 


Val 


Met 


Lys 


Glu 


Phe 


Gly 


Met 










85 










90 










95 




Gly 


Pro 


Asn 


Gly 


Ala 


He 


Met 


Thr 


Cys 


Leu 


Asn 


Leu 


Met 


Cys 


Thr 


Arg 








100 










105 










110 






Phe 


Asp 


Lys 


Val 


He 


Glu 


Leu 


lie 


Asn 


Lys 


Arg 


Ser 


Ser 


Asp 


Phe 


Ser 






115 










120 










125 








Val 


Cys 


Leu 


Leu 


Asp 


Thr 


Pro 


Gly 


Gin 


He 


Glu 


Ala 


Phe 


Thr 


Trp 


Ser 




130 










135 










140 










Ala 


Ser 


Gly 


Ser 


He 


He 


Thr 


Asp 


Ser 


Leu 


Ala 


Ser 


Ser 


His 


Pro 


Thr 


145 










150 










155 










160 


Val 


Val 


Met 


Tyr 


He 


Val 


Asp 


Ser 


Ala 


Arg 


Ala 


Thr 


Asn 


Pro 


Thr 


Thr 










165 










170 










175 




Phe 


Met 


Ser 


Asn 


Met 


Leu 


Tyr 


Ala 


Cys 


Ser 


He 


Leu 


Tyr 


Arg 


Thr 


Lys 








180 










185 










190 






Leu 


Pro 


Phe 


He 


Val 


Val 


Phe 


Asn 


Lys 


Ala 


Asp 


He 


Val 


Lys 


Pro 


Thr 






195 










200 










205 








Phe 


Ala 


Leu 


Lys 


Trp 


Met 


Gin 


Asp 


Phe 


Glu 


Arg 


Phe 


Asp 


Glu 


Ala 


Leu 




210 










215 










220 










Glu 


Asp 


Ala 


Arg 


Ser 


Ser 


Tyr 


Met 


Asn 


Asp 


Leu 


Ser 


Arg 


Ser 


Leu 


Ser 


225 










230 










235 










240 


Leu 


Val 


Leu 


Asp 


Glu 


Phe 


Tyr 


Cys 


Gly 


Leu 


Lys 


Thr 


Val 


Cys 


Val 


Ser 










245 










250 










255 
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Ser 


Ala 


Thr 


Glv 


Glu 


Glv 


Phe 


Glu 








260 










Ser 


Val 


Glu 


Ala 


Tyr 


Lys 


Lys 


Glu 






275 










280 


Leu 


Ala 


Glu 


Lys 


Lys 


Leu 


Leu 


Asp 




290 










295 




Glu 


Glu 


Thr 


Leu 


Lys 


Gly 


Lys 


Ala 


305 










310 






Asn 


Pro 


Asp 


Glu 


Phe 


Leu 


Glu 


Ser 










325 








lie 


His 


Leu 


Gly 


Gly 


Val 


Asp 


Glu 



340 



Glu Arg Ser 
355 



Asp 


vai 


jxie r. 


Thr 


A_La 


lie 


Asp 


GlU 


D £Z CL 










o *~7 r\ 






Tyr 


Val 


Pro 


Met 


Tyr 


Glu 


Lys 


Val 










285 








Glu 


Glu 


Glu 


Arg 


Lys 


Lys 


Arg 


Asp 








300 










Val 


His 


Asp 


Leu 


Asn 


Lys 


Val 


Ala 






315 










320 


Glu 


Leu 


Asn 


Ser 


Lys 


He 


Asp 


Arg 




330 










335 




Glu 


Asn 


Glu 


Glu 


Asp 


Ala 


Glu 


Leu 


345 










350 







(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met 


Ser 


Glu 


Lys 


Thr 


Phe 


His 


Lys 


Ala 


Gin 


Thr 


He 


Arg 


Ala 


Lys 


Ala 


1 








5 










10 










15 




Ser 


Gly 


Val 


Pro 


Ser 


He 


Val 


Glu 


Ala 


Val 


Gin 


Phe 


His 


Gly 


Val 


Arg 








20 










25 










30 






He 


Thr 


Lys 


Asn 


Asp 


Ala 


Leu 


Val 


Lys 


Glu 


Val 


Ser 


Glu 


Leu 


Tyr 


Arg 






35 










40 










45 








Ser 


Lys 


Asn 


Leu 


Asp 


Glu 


Leu 


Val 


His 


Asn 


Ser 


His 


Leu 


Ala 


Ala 


Arg 




50 










55 










60 










His 


Leu 


Gin 


Glu 


Val 


Gly 


Leu 


Met 


Asp 


Asn 


Ala 


Val 


Ala 


Leu 


He 


Asp 


65 










70 










75 










80 


Thr 


Ser 


Pro 


Ser 


Ser 


Asn 


Glu 


Gly 


Tyr 


Val 


Val 


Asn 


Phe 


Leu 


Val 


Arg 










85 










90 










95 




Glu 


Pro 


Lys 


Ser 


Phe 


Thr 


Ala 


Gly 


Val 


Lys 


Ala 


Gly 


Val 


Ser 


Thr 


Asn 








100 










105 










110 






Gly 


Asp 


Ala 


Asp 


Val 


Ser 


Leu 


Asn 


Ala 


Gly 


Lys 


Gin 


Ser 


Val 


Gly 


Gly 






115 










120 










125 








Arg 


Gly 


Glu 


Ala 


He 


Asn 


Thr 


Gin 


Tyr 


Thr 


Tyr 


Thr 


Val 


Lys 


Gly 


Asp 




130 










135 










140 










His 


Cys 


Phe 


Asn 


He 


Ser 


Ala 


He 


Lys 


Pro 


Phe 


Leu 


Gly 


Trp 


Gin 


Lys 


145 










150 










155 










160 


Tyr 


Ser 


Asn 


Val 


Ser 


Ala 


Thr 


Leu 


Tyr 


Arg 


Ser 


Leu 


Ala 


His 


Met 


Pro 










165 










170 










175 




Trp 


Asn 


Gin 


Ser 


As P 


Val 


Asp 


Glu 


Asn 


Ala 


Ala 


Val 


Leu 


Ala 


Tyr 


Asn 








180 










185 










190 






Gly 


Gin 


Leu 


Trp 


Asn 


Gin 


Lys 


Leu 


Leu 


His 


Gin 


Val 


Lys 


Leu 


Asn 


Ala 






195 










200 










205 








He 


Trp 


Arg 


Thr 


Leu 


Arg 


Ala 


Thr 


Arg 


Asp 


Ala 


Ala 


Phe 


Ser 


Val 


Arg 




210 










215 










220 










Glu 


Gin 


Ala 


Gly 


His 


Thr 


Leu 


Lys 


Phe 


Ser 


Leu 


Glu 


Asn 


Ala 


Val 


Ala 


225 










230 










235 










240 



WO 99/1 0482 12/26 PCT/CA98/00803 



Val 


Asp 


Thr 


Arg 


Asp 

Z ft D 


Arg 


Pro 


He 


Leu 


Ala 

Z O U 


Ser 


Arg 


Gly 


lie 


Leu 

ore; 


Ala 


Arg 


Phe 


Ala 


Gin 


Glu 


Tyr 


Ala 


Gly 


Val 


Phe 


Gly Asp 


Ala 


Ser 


Phe 


Val 








9 SO 










9 R 

ZD*-) 










on c\ 
z / u 






Lys 


Asn 


Thr 


Leu 


Asp 


Leu 


Gin 


Ala 


Ala 


Ala 


xr r (J 


Leu 


Pro 


Leu 


Gly 


Phe 




Z / o 










z o u 










opt; 
ZOO 








He 


Leu 

o q n 
z y u 


a 




Ser 


ir ne 


<j_Ln 
z y o 


ai a 


Lys 


IT "J d 


Leu 


Lys 
300 




Leu 


bxy 


Asp 


Arg 


(jjXU 


Vdl 


111 S 


Tic 


Leu 


Asp 


Arg 


Cys 


iyr 


Leu 


K3±y 


ur j_ y 






Asp 






















315 










jZU 


Val 


Arg 


Gly 


Phe 


Gly 
oZ o 


Leu 


Asn 


Thr 


He 


Gly 


Vdl 


Lys 


Ala 


Asp 


Asn 

*3 *3 R 
o o O 


Ser 


Cys 


Leu 


Gly 


Gly 
o u 


Gly 


Ala 


Ser 


Leu 


Ala 

O *± D 


Gly 


Val 


Val 


His 


Leu 


Tyr 


Arg 


Pro 


Leu 


He 
355 


Pro 


Pro 


Asn 


Met 


Leu 
360 


Phe 


Ala 


His 


Ala 


Phe 
365 


Leu 


Ala 


Ser 


Gly 


Ser 


Val 


Ala 


Ser 


Val 


His 


Ser 


Lys 


Asn 


Leu 


Val 


Gin 


Gin 


Leu 


Gin 


370 










375 










380 










Asp 


Thr 


Gin 


Arg 


Val 


Ser 


Ala 


Gly 


Phe 


Gly 


Leu 


Ala 


Phe 


Val 


Phe 


Lys 


385 










390 










395 










400 


Ser 


He 


Phe 


Arg 


Leu 
405 


Glu 


Leu 


Asn 


Tyr 


Thr 
410 


Tyr 


Pro 


Leu 


Lys 


Tyr 
415 


Val 


Leu 


Gly 


Asp 


Ser 
420 


Leu 


Leu 


Gly 


Gly 


Phe 
425 


His 


He 


Gly 


Ala 


Gly 
430 


Val 


Asn 



Phe Leu 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



Met 


Leu 


Tyr 


He 


Leu 


Trp 


Lys 


Leu 


Asn 


Tyr 


Leu 


Gin 


Lys 


Lys 


Met 


Ser 


1 








5 










10 










15 




Leu 


Arg 


Lys 


He 


Asn 


Phe 


Val 


Thr 


Gly 


Asn 


Val 


Lys 


Lys 


Leu 


Glu 


Glu 








20 










25 










30 






Val 


Lys 


Ala 


He 


Leu 


Lys 


Asn 


Phe 


Glu 


Val 


Ser 


Asn 


Val 


Asp 


Val 


Asp 






35 










40 










45 








Leu 


Asp 


Glu 


Phe 


Gin 


Gly 


Glu 


Pro 


Glu 


Phe 


He 


Ala 


Glu 


Arg 


Lys 


Cys 




50 










55 










60 










Arg 


Glu 


Ala 


Val 


Glu 


Ala 


Val 


Lys 


Gly 


Pro 


Val 


Leu 


Val 


Glu 


Asp 


Thr 


65 










70 










75 










80 


Ser 


Leu 


Cys 


Phe 


Asn 


Ala 


Met 


Gly 


Gly 


Leu 


Pro 


Gly 


Pro 


Tyr 


He 


Lys 










85 










90 










95 




Trp 


Phe 


Leu 


Lys 


Asn 


Leu 


Lys 


Pro 


Glu 


Gly 


Leu 


His 


Asn 


Met 


Leu 


Ala 








100 










105 










110 






Gly 


Phe 


Ser 


Asp 


Lys 


Thr 


Ala 


Tyr 


Ala 


Gin 


Cys 


He 


Phe 


Ala 


Tyr 


Thr 






115 










120 










125 








Glu 


Gly 


Leu 


Gly 


Lys 


Pro 


He 


His 


Val 


Phe 


Ala 


Gly 


Lys 


Cys 


Pro 


Gly 




130 










135 










140 










Gin 


He 


Val 


Ala 


Pro 


Arg 


Gly 


Asp 


Thr 


Ala 


Phe 


Gly 


Trp 


Asp 


Pro 


Cys 


145 










150 










155 










160 
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Phe Gin Pro Asp Gly Phe Lys Glu Thr Phe Gly Glu Met Asp Lys Asp 

165 170 175 

Val Lys Asn Glu lie Ser His Arg Ala Lys Ala Leu Glu Leu Leu Lys 

180 185 190 

Glu Tyr Phe Gin Asn Asn 
195 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CGAACACTTT ATATTTCTCG 2 0 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GATAGTTCCC TTCGTTCGGG 20 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTTCTGGATT TTAACCTTCC 20 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



WO 99/10482 



14/26 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
TTTCCGAGAA GTCACGTTGG 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
TACAGGAATT TTTGAACGGG 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
CTTCAGATGA CGTGGATTCC 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GGAATCCGAA AAAGT GAACT 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
AAGAGATACA CTCAATGGGG 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
AT C GAT AC C A CCGTCTCTGG 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
TTGAATCTAC ACTAAT CACC 



(2) INFORMATION FOR SEQ ID NO: 18: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
CCAATT AT CT TTTCCAGTCA 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 



ACATTATAAA GTTACTGTCC 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TTTTAGTTAA AGCATTGACC 20 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acici 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
ACATCTTTAT CCATTTCTCC 20 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TGCAAAGGCT CTGGAACTCC 20 



(2) INFORMATION FOR SEQ ID NO: 23:. 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



AAAAAC CACT T GATAT AAGG 



20 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



CAT C C AAAAG CAGTAT CACC 



20 



TTAATTGGAT GCAAGCACCC C 



21 



AT T ACTATAC GAACATTTCC 



20 



TTGTAAAGGC GTTAGTTTGG 



20 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



C AG GAGT AT T TGGTGATGCG 



20 



CGACGGGGAG AAGGT GACGG 



20 



AAAACTTCTA CCAACAAT GG 



20 



CGTAATCTCT CTCGATTAGC 



20 
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(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acici 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acici 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



CCGTGGGATG GCTACTTGCC 



20 



TGGATTTGTG GCACGAGCGG 



20 



TTGATTGCCT CTCCTCGTCC 



20 



AT CAACAT CT GATTGATTCC 



20 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CAGCGAGCGC AT GCAACT AT ATATTGAGCA GG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
AATAAATATT TAAATATTCA GAT AT AC C CT GAACT CTACA G 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
AAACTGTAGA GTTCAGGGTA TATCTGAATA TTTAAAT AT T TATTC 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GTACGTGGAG CTCT GCAACT ATATATTGAG CAGG 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
AT GACACT GC AGGATAGTTC CCTTCGTTCG GG 32 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GTGTTGCATC AGTTCATTCC 20 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GCTGTGCTAG AAGT CAGAGG 20 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GTTCTCCTTG GAATT CAT CC 2 0 
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(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 



AGTATATCTA GATGTGCGAG TCTCTGCCAA TT 



32 



AGTAATTGTA CATTTAGTGG 



20 



ATTAAC CTTA CTTACTTACC 



20 



CTAAACTAAG TAATATAACC 



20 



WO 99/10482 



23/26 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 8 
GTTGATTCTT TGAGCACTGG 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 9 
AATT CGACCA ATTACATTGG 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
AACATAGTTG TTGAGGAAGG 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 



AATTAAT GGA GATT CTACGG 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52 
TCAGCATCTA GAAATGCAGG 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53 
CGAATGTCAA CAT T CACT GG 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 
CTTAACCTGA TGTGTACTCG 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 



AT GAAGCTTT AGAGGAT GCC 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CGACGAATTT CTGGAGTCGG 2 0 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57 : 
ACT GCAT TAT CCATTAATCC 20 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
CACCCAAATA ACATCTATCC 20 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 



TTTAACCTCA TCTTCGCTGG 



20 
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(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 



(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 



(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 



ATGTTCCGCA AGCTTGGTTC 



20 



TTTAATTACC CAAGT TT GAG 



20 



TTTTAACCCA GTTACTCAAG 



20 
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